Framework for Social Media Analysis Based on Hashtag Research

: Social networks have become a common part of many people’s daily lives. Users spend more and more time on these platforms and create an active and passive digital footprint through their interaction with other subjects. These data have high research potential in many ﬁelds, because understanding people’s communication on social media is essential to understanding their attitudes, experiences and behaviours. Social media analysis is a relatively new subject. There is still a need to develop methods and tools for researchers to help solve typical problems associated with this area. A researcher will be able to focus on the subject of research entirely. This article describes the Social Media Analysis based on Hashtag Research (SMAHR) framework, which uses social network analysis methods to explore social media communication through a network of hashtags. The results show that social media analysis based on hashtags provides information applicable to theoretical research and practical strategic marketing and management applications.


Introduction
Social networks have become a common part of many people's daily lives and have become a space where individual users share information [1]. Currently, there are about 3.6 billion users of social networks, and according to forecasts, more than 4.41 billion users will use social networks in 2025 [2]. When comparing the prediction of the number of social network users and the global population for 2025 (8,184,437,460 inhabitants) [3], social networks will be used by about 54% of the world population.
In contrast to traditional mass media, such as television, newspapers and radio, content consumers have become co-creators of communication on individual social networks such as Facebook, Twitter and Instagram [4,5], in addition to private messages, of which users exchange approximately 60 billion every day on social networks. These users create over 500 million tweets per day on Twitter, upload over 95 million photos per day to Instagram, and update more than a billion statuses, post over 1.8 billion comments and upload about 480 million photos per day to Facebook [6]. Following these data, which users create from their interactions, social media analysis is a fast-growing research area aimed at extracting useful information [7], which has the potential to grow, in terms of both the number of users on social networks [8] and the growth of the digital footprint [9].

Theoretical Background
Nowadays, synonymous terms for social networks are heavily intertwined and interchangeable, which can be seen in relation to Facebook, as indicated in references [33,34], where it is a social medium platform, and in references [35,36], where it is a social network. Facebook is an example of an overlap between the two areas. Reference [37] defines social networks as "web-based services that allow individuals to (1) construct a public or semi-public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system." Reference [38] defines social media as "Internet-based channels that allow users to interact opportunistically and selectively self-present, either in real-time or asynchronously, with both broad and narrow audiences who derive value from user-generated content and the perception of interaction with others." In connection with these concepts, the term social network will be used in this article to describe a platform that enables a connection between individual users and social media platforms as possibilities of this network in the field of mass information sharing. We follow the research [39], which defines online social networking as a process whose goal is to allow people to connect with each other. Social networks are tools for sharing content [40].
More on the issue of defining the difference between social media and social network can be found in the research of Rhee (2021) [41], which focuses only on this issue.

Social Network Analysis vs. Social Media Analysis
Following the definition of the terms social network and social media, it is also necessary to define the difference between the terms of social network analysis (SNA) and social media analysis (SMA).
Social network analysis examines social structures using network theory and graph theory [42]. The importance of social network analysis can be seen in the number of articles with a title containing the keywords "Social Network Analysis" in the ScienceDirect database, which increased from 251 articles in 2010 to 1259 in 2020.
Social media analysis aims to collect, monitor, analyse, summarise, and visualise data from social media, from which useful patterns and information can be identified [43]. It is a process of collecting data from communication on digital media and processing them into structured views, which leads to better business decisions based on the obtained information [44]. Reference [45] defines social media analysis as a process of obtaining unstructured data from social media and analysing these data for areas of business decisionmaking based on the information obtained. This analysis is useful for product development, product innovation, product utilisation, brand engagement, competitive intelligence, and general marketing [46].

Social Media Analysis Based on Hashtag Research Framework
The SMAHR framework is based on the theoretical process, knowledge discovery in database (KDD) [47]. The KDD process comprises seven basic activities (see Figure 1). As with the KDD process, the SMAHR framework does not include the process of obtaining data from social media. For this part, it is necessary to use specialized software, such as Netlytic [48], or obtain data from social media through the API of individual social media.

1.
Data Cleaning-removing irrelevant, confusing or misleading data from a data file [49]. At this stage, it is necessary to understand the context of the hashtag and the various situations for its use. It is mainly a matter of removing irrelevant and confusing data. In this area, these are mainly hashtags that can be used in various contexts. An example is reference [14], which focused on corporate social responsibility. Netlytic software [48] was used to download the data, where a condition was specified for downloading messages that contain the hashtag #CSR (no font size limitation). Total downloads of 1,172,868 Instagram posts. Based on the analysis of communities (see phase 5-Data Mining), the community dealing with the computer game "CSR Racing" was extracted, where the hashtag #CSR was also used. This point is critical to the scope of data mining, and non-topic messages must be removed.

2.
Data Integration-a process that is defined as data homogenization [50]. An essence of the data integration activity lies in this process in the creation of a homogeneous data set, with which it is possible to go to the third phase-Data Selection. Due to the different API settings of individual social networks, it is necessary to create a board structure for further work, containing the following items: • ID-identifier of an item in a data file. It is not necessary for further processing, but speeds up an orientation in an overall data set; • Author-this is the author's identifier of particular message. This information is important in terms of identifying a number of unique users in the data file. For example, if data have been downloaded throughout the year, one user will likely comment on the topic more than once; • Message-this item contains the text part of the message. If it is possible to insert hashtags into other parts of the message on a social network (some social networks have a separate message title and message body), it is necessary to merge these parts into one item; • Location-this is the identification of the place where the message was sent or the user's residence. This information can be used to identify regional differences (more in phase 5-Data Mining). This is a very important piece of information that can give us information whether the messages are captured throughout the period, and there is no particular outage in the data file. It can also inform us about seasonality in certain regions, which is useful, for example, in the field of farmers' markets, where in central Europe, for example, customers will communicate in connection with farmers' markets in spring other products than those produced in summer and fall [12].

3.
Data selection-a process during which a researcher decides what data are relevant for further analysis. The SMAHR framework focuses on hashtag analysis. For this reason, it is necessary to remove all text that is not a hashtag from the message field (see the Data Integration phase) at this point. It removes any text that does not start with a "#". For this, it is possible to use the Hashtag Matcher 1.2 module, which is described in more detail in the next phase-Data Transformation. These two phases are closely related in this framework. Message sending location Research can be divided into two parts: • An analysis of communication from a global perspective-without regional differences. All messages are used. • An analysis of a particular region, or an analysis of regional differences-if a study is focused on identifying a regional difference or is focused on an analysis of a specific region. In this case, it is necessary to use data filtering based on values in the location field (see phase Data Integration) to select only those data that contain information about the location of the region.
An example is reference [14], which focused on identifying differences in corporate social responsibility based on the selected hashtag #CSR between developed and developing countries. In this study, 24,339 messages (21.42%) were selected from the basic set of 113,628 Instagram posts, which contain information about the message's location. A total of 9712 messages were sent from developed countries and 14,627 from developing countries. Based on the previous step (Data Integration), this filtering can be applied, for example, in MS Excel, Apache OpenOffice, or another spreadsheet program.

4.
Data Transformation-this is about transforming data into a suitable form, which is required by the information mining system. Basically, it is about preparing data for an information system working with data. The method of data transformation depends on the software used for data mining. The SMAHR framework recommends Gephi [51] in the current version 0.9.2. This software contains its module for importing data that are saved in CSV format. If a study is focused on regional differences or on an analysis of a specific region, it is necessary to perform data filtering based on the selected region. For this step, it is necessary to transform the data into a form where it will be possible to identify the selected region. Here, it depends very much on the API of individual social networks. At the API base, they either provide textual location information in address format or express that position through latitude and longitude, i.e., geographic coordinates. The next significant step in this phase is to transform the rest of the message into a format for data mining software. Here, it is possible to use Hashtag Matcher 1.2 software (see Supplementary Materials to this article). This software has two primary functions: Modify hashtags in a message to the form needed for Gephi program converts: All text in lowercase. Hashtag #prague and #Prague are two different hashtags for Gephi. For this reason, the first part of the transformation is focused on converting all characters to lowercase; If two or more hashtags are connected, for example, # farmersmarket # organicfood # fresh, these hashtags are separated by a space on #farmersmarket #organicfood #fresh so that the program detects three hashtags. For a large data file, hashtags that are less than the specified value in the entire data file are removed. This mainly involves removal of typos, which removes all hashtags that are in the data file 1×.
Parameters for Hashtag Matcher 1. Example: java -jar java-parserv3.jar -i = inputfile.csv -o = outputfile.csv -l = 100 -m = HashtagsMatcher -csv-data-column = 0 Based on the previous steps of Data Integration and Data Transformation, the data can be imported into a data mining process software. The essence of the data import is in the creation of a network of hashtags, based on their relationship in a given message and between hashtags in the entire data file, as seen in Figure 2. The weight value corresponds to a number of links between the given hashtags. between hashtags in the entire data file, as seen in Figure 2. The weight value corresponds to a number of links between the given hashtags. Figure 2. Transformation example of hashtags from messages to the network (case study of corporate social responsibility). Source: reference [14].

Language homogenization
If a study focuses on global communication or on conferences between regions, it is necessary to homogenize languages into one selected language.
An example is the case study of #zerowaste: it is necessary to convert the hashtag #bhayplastik (Hindi) to #bioplastic (English), and the hashtag #lebensmittelverschwendung (German) to #foodwaste (English).

Data
Mining-specific techniques used to extract potentially useful patterns. The SMAHR framework is focused on the Gephi program (currently version 0.9.2).
Within this framework, the data mining techniques are used in this software in three basic areas: • Analysis of characteristics at the level of individual hashtags; • Analysis of network characteristics; • Network visualization.

Analysis of characteristics at the level of individual hashtags
The following methods can be used to analyse characteristics at the level of individual hashtags:

Degree Centrality
In the field of graph theory and social network analysis, this characteristic is also called the degree of the peak. It is a degree centrality based on the importance of a vertex based on its degree in the network, which is calculated as a number of interactions with other vertices using edges. Here, it is necessary to distinguish whether the graph is oriented or non-oriented. In the area of hashtag analysis in the SMAHR framework, we use an unoriented graph (it is not possible to determine the direction of interaction between individual hashtags in the report).

Eigenvector Centrality
Centrality, according to the eigenvector, is an extension of the basic degree centrality. It is also a measure of the influence of the hashtag in the network. This value is calculated on the assumption that the connection with high-value hashtags (high-level hashtags) is more important than the connection with equally valuable or less valuable hashtags. A high eigenvector centrality score means that a hashtag is linked to many hashtags with a high value, and the formula is as follows: where M (v) denotes a set of adjacent nodes and λ is the largest eigenvalue. Eigenvector x can be expressed by Equation (2): Transformation example of hashtags from messages to the network (case study of corporate social responsibility). Source: reference [14].

Language homogenization
If a study focuses on global communication or on conferences between regions, it is necessary to homogenize languages into one selected language.
An example is the case study of #zerowaste: it is necessary to convert the hashtag #bhayplastik (Hindi) to #bioplastic (English), and the hashtag #lebensmittelverschwendung (German) to #foodwaste (English).

5.
Data Mining-specific techniques used to extract potentially useful patterns. The SMAHR framework is focused on the Gephi program (currently version 0.9.2). Within this framework, the data mining techniques are used in this software in three basic areas: • Analysis of characteristics at the level of individual hashtags; • Analysis of network characteristics; • Network visualization.

Analysis of characteristics at the level of individual hashtags
The following methods can be used to analyse characteristics at the level of individual hashtags:

Degree Centrality
In the field of graph theory and social network analysis, this characteristic is also called the degree of the peak. It is a degree centrality based on the importance of a vertex based on its degree in the network, which is calculated as a number of interactions with other vertices using edges. Here, it is necessary to distinguish whether the graph is oriented or non-oriented. In the area of hashtag analysis in the SMAHR framework, we use an unoriented graph (it is not possible to determine the direction of interaction between individual hashtags in the report).

Eigenvector Centrality
Centrality, according to the eigenvector, is an extension of the basic degree centrality. It is also a measure of the influence of the hashtag in the network. This value is calculated on the assumption that the connection with high-value hashtags (high-level hashtags) is more important than the connection with equally valuable or less valuable hashtags. A high eigenvector centrality score means that a hashtag is linked to many hashtags with a high value, and the formula is as follows: where M (v) denotes a set of adjacent nodes and λ is the largest eigenvalue. Eigenvector x can be expressed by Equation (2):

Betweenness Centrality
The value of this centrality is highest for a hashtag if the paths between any two hashtags in the network always pass through this hashtag. Hashtags with a high degree of this centrality can be referred to as network bottlenecks [52]. These hashtags are important in the network because they act as interconnectors or otherwise as bridges between remote parts of the network. The suitability of this metric is in networks with a low value of modularity, where individual communities in the network are less interconnected than individual hashtags within a given community. It is then possible to identify hashtags, which are bridges between these communities. The value of the centrality for the hashtag v in the graph G : = (V, E) is calculated using the following relation: where σ st is the number of shortest paths from node s to node t, and σ st (v i ) is the number of shortest paths from node s to t that pass through the node v i .

Modularity
Most complex networks incorporate nodes that are mutually interconnected to a larger extent than with the rest of the network. Groups of such nodes are called communities [53]. Modularity represents an index that identifies the cohesion of communities within a particular network [54]. The idea is to identify node communities that are mutually interconnected to a greater degree than the others. Networks with high modularity demonstrate strong links between nodes inside modules, but weaker links between nodes in different modules [55]. The component analysis then identifies the amount of different parts (in the case of community modularity) in the network based on the modularity detection analysis [56], as follows: where ∑ in is the sum of weighted links inside the community, ∑tot is the sum of weighted links incident to hashtags in the community, k i is the sum of weighted links incident to hashtag i, k i , in is the sum of weighted links going from i to hashtags in the community, and m is the normalizing factor as the sum of weighted links for the whole graph.

Network Visualization
The goal of network visualization is to identify individual communities and their relative position. After importing the data into the Gephi program, the network's visualization is concentrated in the basic square, without visualizing the different relationships of individual hashtags, as seen in Figure 3. This visualization is unsatisfactory in terms of the identification of communities and their mutual position, but for the analysis of characteristics at the level of hashtags and the whole network, there is no effect.
Based on the modularity analysis, individual hashtags are assigned colours corresponding to individual groups, which are extracted by this algorithm, as seen in Figure 4. It is now possible to determine the number of communities and their representation through component analysis. In terms of identifying a location of individual communities, this visualization is still insufficient.  In the field of visualization, it is possible to use the following algorithms, which are part of programs for visualization and analysis of data in networks: ForceAtlas, an algorithm used for networks based on the so-called small-world network. This is a type of mathematical graph in which most nodes are not neighbours, but neighbours of any given node are likely to be neighbours of each other, and most nodes can be reached from every other node by a small number of connections [57]; ForceAtlas2, an improved version of the ForceAtlas algorithm, which focuses on large networks. It is a method based on reduced samples' visual representation to define network communities and their types [58]. The advantage over ForceAtlas is its speed and ease of computing. The ideal number of hashtags is 10,000-100,000 [59]. To visualize a network with a number of hashtags from 1000 to 10,000, it is best to use the ForceAtlas 2 algorithm due to the calculation speed. The result of the ForceAtlas 2 algorithm is, for example, a subsequent layout (see Figure 5), where it is possible to identify five main communities and their mutual position (polarity).   In the field of visualization, it is possible to use the follow part of programs for visualization and analysis of data in netw rithm used for networks based on the so-called small-world n mathematical graph in which most nodes are not neighbours, bu node are likely to be neighbours of each other, and most nodes other node by a small number of connections [57]; ForceAtlas the ForceAtlas algorithm, which focuses on large networks. It duced samples' visual representation to define network commu The advantage over ForceAtlas is its speed and ease of compu hashtags is 10,000-100,000 [59]. To visualize a network with a In the field of visualization, it is possible to use the following algorithms, which are part of programs for visualization and analysis of data in networks: ForceAtlas, an algorithm used for networks based on the so-called small-world network. This is a type of mathematical graph in which most nodes are not neighbours, but neighbours of any given node are likely to be neighbours of each other, and most nodes can be reached from every other node by a small number of connections [57]; ForceAtlas2, an improved version of the ForceAtlas algorithm, which focuses on large networks. It is a method based on reduced samples' visual representation to define network communities and their types [58]. The advantage over ForceAtlas is its speed and ease of computing. The ideal number of hashtags is 10,000-100,000 [59]. To visualize a network with a number of hashtags from 1000 to 10,000, it is best to use the ForceAtlas 2 algorithm due to the calculation speed. The result of the ForceAtlas 2 algorithm is, for example, a subsequent layout (see Figure 5), where it is possible to identify five main communities and their mutual position (polarity). 6. Data Evaluation-the main aim is to identify specific p knowledge based on the previous steps obtained. This is espe area on the interpretation of values of the degree of hashtag, modularity and visual assessment of the distribution of comm

Visual Assessment of the Layout of Communities in the Networ
Based on identifying the modularity value, the hashtag degree the network, community identification is performed, where it is p munities characterizing a reduced sample. For this process, it is po into three basic types of communities [58]. The identification of t based on communication and forwarding messages between mem tains six basic types of community. However, in the hashtag analy determine the hashtag's direction compared to other hashtags in th they are selected and only two are described (see Figure 6), which the direction of the relationship between the individual communit are made up of people who have replaced hashtags in the framew

6.
Data Evaluation-the main aim is to identify specific patterns that represent knowledge based on the previous steps obtained. This is especially important in this area on the interpretation of values of the degree of hashtag, eigenvector centrality, modularity and visual assessment of the distribution of communities in the network.

Visual Assessment of the Layout of Communities in the Network
Based on identifying the modularity value, the hashtag degree values and visualizing the network, community identification is performed, where it is possible to identify communities characterizing a reduced sample. For this process, it is possible to use a division into three basic types of communities [58]. The identification of these communities was based on communication and forwarding messages between members. This theory contains six basic types of community. However, in the hashtag analysis, it is not possible to determine the hashtag's direction compared to other hashtags in the given message. Thus, they are selected and only two are described (see Figure 6), which are not conditioned by the direction of the relationship between the individual communities. These communities are made up of people who have replaced hashtags in the framework.
1 Figure 6. Basic communities of users on social networks. Adapted from: "Mapping Twitter Topic Networks: From Polarized Crowds to Community Clusters" [58].
Divided Communities-polarized communities are created for situations where two groups discuss the same topic, but communication between the topics is low. An example is reference [20], which focused on the topic of gamification, and five communities were extracted here, as seen in Figure 5.
Tight Communities-these communities can be found, for example, at conferences, discussions on professional topics and other interest groups that attract similar communities. There are different groups of conversations in these networks, but people are closely linked, even between groups. An example is reference [12], which examines farmers' markets. Four communities were extracted that are very strongly interconnected (see Figure 7).

2021, 11, x FOR PEER REVIEW
10 of 18 closely linked, even between groups. An example is reference [12], which examines farmers' markets. Four communities were extracted that are very strongly interconnected (see Figure 7). The same type of community has been identified in research focusing on organic food [13].  The same type of community has been identified in research focusing on organic food [13].

7.
Knowledge Representation-a technique that uses visualization tools to represent the results of data mining. Knowledge representation is based on the synthesis of individual values and outputs from the data evaluation phase. It is possible to answer research questions here. Based on the identification of network size, network density, frequency and centralities in individual communities, it is possible to identify the essence of the network and individual communities (community focus in terms of communication) and in the context of analysis of the visual layout of the community in the network to determine the relationship of individual communities. Research questions for qualitative research should be posed in terms of the reason for this layout (polarization).

Limitation of Framework
In the field of knowledge representation, it is necessary to draw attention to the limits of this method: Regional Differences-for an analysis that focuses on identifying attitudes without regional differences, see references [13,54]. It is necessary to draw attention to this field in terms of research limitation. Regional differences mainly affect research in two areas of a disproportionate number of reports. An example is reference [14], which identified differences in corporate social responsibility based on the selected hashtag #CSR between developed and developing countries. In this study, 24,339 messages (21.42%) were selected from a basic set of 113,628 Instagram posts, which contain information about the location of the message. Of these, 9712 messages were sent from developed countries and 14,627 from developing countries. If the analysis was created from a global perspective, the results would be affected by more communications from the region of developing countries, which are about 50% more than from developed countries. See Table 1.
Focus only on hashtags-the whole SMAHR framework is focused only on analysis through a specific part of the content, i.e., hashtags. Hashtags do not have to fully cover the meaning of the message, but they are used to express or emphasize something that is not expressed in the text of the message, such as emotions, mood, location, and/or political inclination [60]. Analyses that focus on text content without hashtag analysis have the same limitations. When interpreting the results, it is necessary to draw attention to this limitation in terms of research limitation.
Selected social network-each social network has different target users. There is a difference in users' age structure, the use of technologies by which users connect (mobile vs. desktop), education, etc. These factors affect the resulting attitudes of users.
one of the limitations of this work. The uniqueness of this framework lies mainly in the fact that it covers social media analysis but uses the methods of social network analysis. With this specific focus, the SMAHR is a competitive framework that can be used as an alternative framework in the field of social media analysis, most of which are focused primarily on semantic and sentiment analysis [30,[61][62][63][64]. This thus provides a tool for research triangulation, which has already been confirmed by previous work [28].
As mentioned above, hashtags are a specific part of communication through which users express experience, attitudes, opinions, and values [30,60,65]. Thus, the user can use hashtags to express characteristics that may not be obvious from the post alone [60,66]. For this reason, hashtags are very often inserted at the end of a message (see Figure 8 for an example).
Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 1 field of social media analysis, most of which are focused primarily on semantic and sent ment analysis [30,[61][62][63][64]. This thus provides a tool for research triangulation, which ha already been confirmed by previous work [28]. As mentioned above, hashtags are a specific part of communication through whic users express experience, attitudes, opinions, and values [30,60,65]. Thus, the user can us hashtags to express characteristics that may not be obvious from the post alone [60,66 For this reason, hashtags are very often inserted at the end of a message (see Figure 8 fo an example).  Figure 8 shows an example from Instagram [67], which was filtered based on th hashtag #onions. The message contains three main parts for analysis, as follows: (1) image (2) text, and (3) hashtags.
(1) Image: Using machine learning models for the classification of image content, it i possible to detect objects (i.e., vegetables, onions, cucumber, car, building, etc.) an context (i.e., situation and association, such as natural food, celebration, etc.). In thi case, the cloud vision API algorithm [68] was used. This algorithm has been used to for example, extract brand information from social networks [69] and to recogniz cultural ecosystem services from social media photographs [70].
Google Vision correctly detected vegetable and onion (see Figure 9 and Table 2), bu failed to identify squash, which is not in the image. The percentage corresponds to th confidence that the algorithm assigns to the detected objects.  Figure 8 shows an example from Instagram [67], which was filtered based on the hashtag #onions. The message contains three main parts for analysis, as follows: (1) image, (2) text, and (3) hashtags.
(1) Image: Using machine learning models for the classification of image content, it is possible to detect objects (i.e., vegetables, onions, cucumber, car, building, etc.) and context (i.e., situation and association, such as natural food, celebration, etc.). In this case, the cloud vision API algorithm [68] was used. This algorithm has been used to, for example, extract brand information from social networks [69] and to recognize cultural ecosystem services from social media photographs [70].
Google Vision correctly detected vegetable and onion (see Figure 9 and Table 2), but failed to identify squash, which is not in the image. The percentage corresponds to the confidence that the algorithm assigns to the detected objects. Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 18 Based on the algorithm, Google Vision derived the context of the message, that is, local food (see Table 2), which was not mentioned either in the text of the message or in the hashtags. Estimates such as this are based on the analysis of similar images, which has an algorithm in the database. (2) Text: Frameworks that focus on social media analysis using natural language processing may not identify certain kinds of information because the report from which the hashtags are removed may not contain information. In the case of an onion image (Figure 8), the Cloud Natural Language algorithm [71] was used. This algorithm has been used in thematically similar studies for sentiment analysis of police agency Facebook pages before and after a fatal officer-involved shooting of a citizen [72], and sentiment analysis of consumer reviews [73]. In the case of an onion, the algorithm identified one category "Food & Drink/Food" with a confidence of 52%. In the area  Based on the algorithm, Google Vision derived the context of the message, that is, local food (see Table 2), which was not mentioned either in the text of the message or in the hashtags. Estimates such as this are based on the analysis of similar images, which has an algorithm in the database.
(2) Text: Frameworks that focus on social media analysis using natural language processing may not identify certain kinds of information because the report from which the hashtags are removed may not contain information. In the case of an onion image (Figure 8), the Cloud Natural Language algorithm [71] was used. This algorithm has been used in thematically similar studies for sentiment analysis of police agency Facebook pages before and after a fatal officer-involved shooting of a citizen [72], and sentiment analysis of consumer reviews [73]. In the case of an onion, the algorithm identified one category "Food & Drink/Food" with a confidence of 52%. In the area of sentiment analysis, the algorithm identified the message as neutral, with a value of 0.1 (the score of the sentiment ranges from -1.0 (very negative) to 1.0 (very positive)). (3) Hashtags: The message contained the following 22 hashtags: #kitchengardenz #onions #vegegarden #kitchengarden #gardentotable #raisedbeds #growingvegetables #growingfood #urbanpermaculture #foodforest #organicgarden #selfsufficiency #sustain-ablelifestyle #homestead #ediblegarden #raisedbedgardening #permaculture #veggiepatch #homegrownfood #nzblogger #gardensofinstagram #gardeningnz #myprideofplace #myawapunigarden.
In the case of the onion, a high-quality photo was chosen. Social network users do not create posts following our opportunity to analyse them. This means that the photo may not be of a high quality, and the algorithm for detecting objects in the photo may misidentify objects. There is no condition for creating messages on Instagram to insert text, so an algorithm with a focus on natural language processing cannot be used. Thus, this limitation of individual social networks also limits the use of algorithms that apply natural language processing. For example, Twitter limits users to a maximum post length of 280 characters. This means that Twitter users often express themselves in a different way than they would naturally in everyday life because, in most communication scenarios, one is not limited to 280 characters.
Based on the comparison of machine learning models for the classification of an image, natural language processing, and hashtag analysis, it is not possible to determine a better method for mass processing of communication on social networks. However, to gain knowledge, it is necessary to use triangulation processes in the form of a mutual combination of these methods, which can be supported by qualitative research such as questionnaires [28].

Practical Implications
Previous work has found that the SMAHR framework has practical implications. Research [12] focused on farmer's markets identified four customer segments and recommended that sellers offer a diverse product portfolio at farmer's markets, considering the characteristics of identified customer segments. These results are in accordance with two previous studies [60,61], which collected data from the farmers' markets in Beijing, especially in Gipuzkoa. The study identified six communities. Knowledge about these communities and their polarization can give organizations insights into possible directions for improving their sustainable business model via the contribution of public opinion. One study [14] compared CSR communication in developing and developed countries, and showed that there are different types of communication on social networks; the authors recommend that global companies do not create a unified communication strategy, but rather that they adjust the strategy for each particular region. This approach has also been supported by one study [63] that reported findings from two online surveys. The SMAHR framework has also been used to identify the main topic associated with healthy food on Twitter [74]. That study found that people on social networks connect food with lifestyle, which is not just about nutrition, but about the way a person lives. Future research could focus on the prediction of communication on social networks using artificial intelligence [75], using hashtags, as well as the identification of topic changes over time using structural topic modelling [76].

Conclusions
Social media has become a part of many people's lives, and the analysis of data that users create through their interaction with social networks is thus becoming a very current topic for many areas of research. This article confirms that hashtags, as a specific part of communication, provide a research opportunity to identify the attitudes and experience of users in social networks in any given area. Moreover, the SMAHR framework provides a methodological procedure and software module for working with hashtags.
The presented framework systematically codifies the procedure into a few main steps which could be adapted for future research. By using this framework, researchers can focus on their research goals and resolve typical problems in the analysis process (such as data selection and transformation). The steps described above will help researchers to process all the actions to clarify the results. This is a powerful tool that combines quantitative analysis of hashtags with other research.
As a tool for hashtag analysis, the SMAHR framework will help identify clusters in knowledge analysis of the social network. It will help to categorize contextual information contained in posts, provide an opportunity to categorize user groups based on relevant criteria.
The results in the case studies discussed in this article show that social media analysis based on social network analysis methods which focus on hashtags provides information for theoretical application in scientific research and practical application in strategic marketing and management in many research areas (e.g., organic food, farmer's markets, gamification, sustainability). Some approaches and possibilities for using the framework are suggested in our discussion as model situations for future research.
Social analysis provides researchers with diverse data and multidimensional information. Based on the different analyses of content, we can explain many of the actual problems in different fields. Comparing hashtag analyses will help create a comprehensive, fruitful picture of users' posts.
Author Contributions: Conceptualization, L.P., L.K.S. and R.K.; Methodology, L.P.; Validation, L.K.S., R.K., P.B. and J.P.; Formal analysis, L.K.S. and L.P.; Resources, L.K.S., L.P. and J.P.; Data curation, L.P., L.K.S., P.B. and R.K.; Writing-original draft preparation, L.P.; writing-review and editing, R.K. and L.P.; project administration, J.P. and R.K. All authors have read and agreed to the published version of the manuscript. Institutional Review Board Statement: Ethical review and approval are not required in this study because the information-gathering process focused on tweets and personal information were excluded via data collection. In the studies, the username was coded to a unique ID for the identification of a number of users, but no identifiable private information was collected, following the ethical guidelines and definitions of "studies that are not human subjects research".

Informed Consent Statement: Not applicable.
Data Availability Statement: This article used only data from the cited research.