The Investigation of Health-Related Topics on TikTok: A Descriptive Study Protocol

: The social media application TikTok allows users to view and upload short-form videos. Recent evidence suggests it has signiﬁcant potential for both industry and health promoters to inﬂuence public health behaviours. This protocol describes a standardised, replicable process for investigations that can be tailored to various areas of research interest, allowing comparison of content and features across public health topics. The ﬁrst 50 appearing videos in each of ﬁve relevant hashtags are sampled for analysis. Utilising a codebook with detailed deﬁnitions, engagement metadata and content variables applicable to any content area is captured, including an assessment of the video’s overall sentiment (positive, negative, neutral). Additional speciﬁc coding variables can be developed to provide targeted information about videos posted within selected hashtags. A descriptive, cross-sectional content analysis is applied to the generic and speciﬁc data collected for a research topic area. This ﬂexible protocol can be replicated for any health-related topic and may have a wider application on other platforms or to assess changes in content and sentiment over time. This protocol was developed by a collaborative team of child health and development researchers for application to a series of topics. Findings will be used to inform health promotion messaging and counter-advertising.


Introduction
TikTok is a social media application (app) featuring user-generated content that is free to access via a mobile app or webpage. There is significant research interest in its influence on public health behaviours [1]. Users create content, primarily brief videos, with enhancements such as voiceovers, sound effects, music clips, and green screen effects. Uploading users often add a caption to their video as an extra narration, and may include one or more searchable hashtags in this caption to identify that it belongs to a specific topic or trend. Although relatively new to the social media landscape, TikTok has quickly grown to be the fifth most popular platform with over 1 billion users [2]. The app attracts high daily usage: for example, Android users devote more time to TikTok than any other social media app, with an average of 23.6 h per month [2].
Whilst the primary goal of active TikTok viewers is to find and watch entertaining videos [2], the platform is also used to express creativity, seek fame, influence others (i.e., build influencer brands), and expand social networks, particularly by those who upload videos [3]. Additionally, TikTok is a public forum used for product promotion [4], discussion of health concerns [5], or provision of health advice [1] with varying levels of information credibility [5][6][7]. The platform's inherent and attractive characteristics, such as music and comedy in short-form exposures, encourage mimetic behaviour and social learning, making it a powerful tool for promoting behaviour change or product marketing [8,9]. In fact, TikTok is recognised as the "most powerful advertising channel" by the advertising industry [10]. Curated content is fed to users via inbuilt algorithms and the "For you page" feed is based on their previous interactions with videos (views, likes, comments, shares, and favourites). This tailored content often contains information that confirms users' pre-existing beliefs, resulting in reinforced confirmation bias and, thus, acceptance of misinformation [11]. Given that global users spend almost 5 percent of their waking hours watching TikTok videos [2], there may be opportunities for health promotion professionals and organisations to use the platform to influence users, using techniques used by industry to target their audience on the platform [12].
With the growth in the use of TikTok, particularly by young people, there has been corresponding research interest in understanding how topics are portrayed on TikTok. Recent systematic reviews have highlighted the investigation of a wide range of healthrelated topics, covering issues such as mental health, COVID-19, dermatology, eating disorders, cancer, tics, radiology, sexual health, DNA, public health promotion [13], and substance use [12]. The emergence of the COVID-19 pandemic was heavily documented on TikTok, and investigation of topics related to the outbreak commenced soon after [5,[14][15][16][17][18][19]. The latest topic investigations have been as broad as "mental health" (e.g., [20]) or health promotion efforts [21], as well as more nuanced and niche topics such as LGBT mental health (in development by Paciente et al.; unreferenced), energy drinks [22], and the portrayal of specific health conditions (e.g., [23][24][25][26]) or medical procedures [27]. Table 1 demonstrates the varying approaches taken to investigating a selection of health topics to date. Studies differ in the number of hashtags selected for analysis, with many selecting the most popular hashtag in a topic and others choosing a number of hashtags. Most have analysed video and audio content only, however, some researchers have analysed captions and comments in depth. Some studies have investigated characteristics of the video uploader, such as credentials, popularity (measured as previous engagement with that posting user), and characteristics of other videos posted by that user. Previously it was possible to use automated scripts to scrape metadata, comments, and video content from TikTok, however, with recent changes to the Community Guidelines this is no longer an approved practice. Cross-sectional content analysis is a popular methodology in the investigation of social media posts, with most topic investigations on TikTok to date using some form of this analysis [13]. Content analysis of videos allows researchers to explore the experiences of platform users [32] and can be used to assay indicators of the video's purpose, popularity, and means of mimetic social influence such as "trends" or "challenges". A recent systematic analysis of substance use portrayals on social media platforms noted that about half of their included studies had reported on sentiment [12]. Sentiment analysis, a category of content analysis, is a method of classifying subjective text-based content into an evaluation of polarity, primarily positive versus negative [33]. Popularly, sentiment analysis utilises supervised or unsupervised (lexicon-based) machine-learning methods [33], including the development of detailed sentiment dictionaries, to assess and categorise the overall sentiment of social media content by evaluating text-based content and other social information [34].
Whilst there is a growing body of research investigating disparate health and development topics on TikTok, there has not been a consistent approach to sampling, data collection, and analysis. This protocol has been developed to provide guidance for researchers undertaking content analysis on TikTok, describing a replicable process that can be tailored for topic-specific areas. This consistent approach enables the comparison of content and features of TikTok videos both cross-sectionally (across public health topics) and longitudinally (within specific topics) on TikTok [35].

Materials and Equipment
The overall aim of this protocol is to investigate user-generated content related to any health or development topic on TikTok. Researchers applying the protocol will develop research questions specific to their own topic of interest. For example, the research group leading this protocol will investigate topics within their own child health and development research interests (including parenting tips, diet, disability and sport, intellectual disability, body image, anxiety, and ear health). This protocol has been designed to facilitate a collaborative project comparing the descriptive content of videos aligned with multiple health topic areas of interest in a child health research institution, resulting in a checklist of common elements in popular videos relating to child health. This checklist will inform the development of future public health messaging, including counter-advertising.
As this protocol was specifically developed with this outcome in mind, sentiment analysis is limited to the video and caption content and excludes posting user information and interactions with other users. To allow for the broad application of this protocol, the categorisation of sentiment will be undertaken without the assistance of machine learning. To more fully investigate consumer engagement and levels of misinformation within a specific topic area, researchers should also consider the communicative aspects of the videos by conducting sentiment analysis of comments and studying the persuasiveness and effectiveness of messaging [36].

Detailed Procedure
The protocol, therefore, comprises five main parts: (i) the process to identify relevant hashtags and sample videos for study, (ii) description of a generic codebook relevant to any topic, (iii) instructions to develop an additional topic-specific codebook, (iv) the process of applying the codebook, and the analysis process discussed in Section 4. See Figure 1 for an overview of the study protocol.
Digital 2023, 3, FOR PEER REVIEW 6 videos by conducting sentiment analysis of comments and studying the persuasiveness and effectiveness of messaging [36].

Detailed Procedure
The protocol, therefore, comprises five main parts: (i) the process to identify relevant hashtags and sample videos for study, (ii) description of a generic codebook relevant to any topic, (iii) instructions to develop an additional topic-specific codebook, (iv) the process of applying the codebook, and the analysis process discussed in Section 4. See Figure  1 for an overview of the study protocol. i.

Identifying hashtags and selecting videos
For each topic area under investigation, one or more research questions are developed. Drawing on these questions, keywords or phrases will be identified from relevant grey and peer-reviewed literature to direct the search for pertinent videos. An example research question for the topic "parenting young children" may be "What proportion of parenting advice given on TikTok videos purports to be from an expert?" Identified keywords may include "parenting tips", "parenting hacks", "parenting toddlers", and "parenting preschool".
To avert potential bias introduced by the TikTok algorithm, which selects videos based on prior history, an incognito browser will be used to search for and select videos, without creating an account. Some features (for example, the search function) operate differently on a web browser and mobile device using the app. Using a mobile phone with a new account, the keywords will be input into the search function within the TikTok app, producing a comprehensive list of hashtags associated with that keyword or phrase, sorted by similarity to the search terms. As the TikTok algorithm returns some hashtags without direct and obvious relevance to the topic, these can be discarded. Of the remaining relevant hashtags, the hashtags with the greatest number of views on the sample date will be selected for investigation. Switching to the incognito web browser, typing the identified hashtag into TikTok's "Top" search function will yield videos relating to that word. i.

Identifying hashtags and selecting videos
For each topic area under investigation, one or more research questions are developed. Drawing on these questions, keywords or phrases will be identified from relevant grey and peer-reviewed literature to direct the search for pertinent videos. An example research question for the topic "parenting young children" may be "What proportion of parenting advice given on TikTok videos purports to be from an expert?" Identified keywords may include "parenting tips", "parenting hacks", "parenting toddlers", and "parenting preschool".
To avert potential bias introduced by the TikTok algorithm, which selects videos based on prior history, an incognito browser will be used to search for and select videos, without creating an account. Some features (for example, the search function) operate differently on a web browser and mobile device using the app. Using a mobile phone with a new account, the keywords will be input into the search function within the TikTok app, producing a comprehensive list of hashtags associated with that keyword or phrase, sorted by similarity to the search terms. As the TikTok algorithm returns some hashtags without direct and obvious relevance to the topic, these can be discarded. Of the remaining relevant hashtags, the hashtags with the greatest number of views on the sample date will be selected for investigation. Switching to the incognito web browser, typing the identified hashtag into TikTok's "Top" search function will yield videos relating to that word. Clicking the required hashtag within one of the displayed videos will return a list of all videos with the selected hashtag as their first hashtag listed in the video caption. For each identified hashtag, the first videos returned from the search will be selected for investigation.
For the project led by the team developing this protocol, five hashtags will be selected for each topic, with the first 50 unique videos returned for each hashtag sampled (noting that metadata for 55 videos may be collected to allow for the replacement of duplicates). This sample size was selected based on prior studies [22] and after piloting the protocol to ensure a sample size sufficiently large that content patterns are evident. At the time of data collection, each video will be assigned a unique ID (recognising its order in the sample) and linked to its Uniform Resource Locator (URL) in a master database (Excel spreadsheet). Engagement metadata (number of likes, number of comments, and number of shares), and a screenshot of the video for verification of these metrics, will be entered in a REDCap survey [37,38] hosted at the Telethon Kids Institute. REDCap is a secure, web-based software platform designed to support data capture for research studies. One survey form will be created for each video identified and linked to both its unique ID and URL. To achieve accurate collection time-sensitive engagement data, metadata for all videos relating to each topic (n = 250) will be collected within as short a time period as possible, optimally within two days. Where the same video appears in the sample for two hashtags, it will be replaced in the hashtag sample with the smaller number of likes, by the next appearing video.
ii. Generic codebook including definitions A detailed generic codebook, including definitions, has been developed to ensure consistency of descriptive data collection across topics. These variables are drawn from prior social media content analyses [30,39] or developed by the team to inform future development of topic-related public health messaging. The definitions have been refined in an iterative process involving over 15 researchers and students to optimise the reliability and utility of these coding variables. In addition to the engagement metadata previously described, variables defined in this section of the codebook are associated with the characteristics of the video (length, audio track, date posted), whether the video poster is recognised as a verified user by the platform ("blue tick"), and the content of the video. Content variables ( Table 2) include descriptions of people appearing in the video (number, gender, age), the language of text and audio, and the setting (indoor vs. outdoor); whether the video's purpose is primarily informative (including educational or instructional); overall sentiment (negative, neutral, positive); whether TikTok video techniques are used (green screen, duets or stitches); whether interaction is actively encouraged; and whether there are links to challenges or trends. To minimise variability in the coding of the variables which are most open to subjective interpretation, such as age, gender, sentiment, and sexualised elements, researchers will be trained in the use of the codebook using examples for each variable (training examples are available upon request), and consistency between coders will be assessed throughout the process.
iii. Codebook development for specific topic areas Additional coding variables specific to topics will be developed from relevant literature and associated with topic-specific research questions. At least two members of the research team, including a designated primary coder and secondary coder, will independently code a minimum of five videos randomly sampled from each hashtag (total n = 25 videos) to test the suitability of topic-specific variables and specificity of definitions. In an iterative process, additional variables and refinement of codebook definitions will be determined by the agreement of the research team.
Exclusion criteria specific to topic areas will also be determined by the agreement of the research team. For example, several videos selected for sampling in an energy drinks investigation were for supplements not included in the definition of energy drinks [22]. This will allow the exclusion of videos that have had a spurious hashtag applied and those for which the specific codebook variables will not be applicable. Table 2. Generic codebook variables, survey questions, and definitions.

Content Variable Question Prompt and Response Options Definitions
Actors How many people are in the video? 0, 1, 2, 3, 4, 5, >5 The number of people or active "characters" who engage with the video/camera, or who are involved with/contribute to the video. Exclude narrator(s) if not pictured or their face is not shown. Hands do not count as a person; they must have their face in the video. Using all aspects of the video, including verbal messages, music, and written messages:"Positive": overall the video depicts the topic in a positive way; it may promote or encourage a behaviour or product, or indicate the user enjoys or is supportive of a behaviour. "Negative": overall the video depicts the topic in a negative way; it may deter or discourage a behaviour or product, or indicate the user does not enjoy or is not supportive of a behaviour. "Neutral": Overall the video does not depict a clear position on the behaviour or product, or is ambiguous.

Mode
How is the information in the video communicated? (not mutually exclusive-select all that apply) Speech, text, visual, not applicable "Speech": one or more characters or the narrator speaking "Text": text inserted into video or caption "Visual": main point of the video is just filming something demonstrative, such as making a meal.

Challenge
Is the video in response to a #challenge? Yes, no What is the name of the challenge?
TikTok challenges are started by users and often involve viewers completing a task or achievement and posting video proof with the hashtag attached. If there is a hashtag including the word "challenge" record this.
(Note-for some popular challenges creators will not use the hashtag and specifically indicate it is a challenge, although regular viewers will recognise as a challenge. For consistency of coding, only categorise "Yes" if #challenge is used).

Content Variable Question Prompt and Response Options Definitions
Interaction Does the TikTok video, caption, or hashtag encourage interaction from other users? Yes, no This includes-but is not limited to-requesting viewers to "like", "comment", "follow" or "share"; or expressions along these words (e.g., "tell me what you think?). Some creators will say in the video or caption "like/share/comment if you can relate" or "follow for more tips" or similar.

Duet
Is the video a duet? Yes, no A TikTok duet plays split-or green-screened with another user's TikTok video; it looks like two or more videos playing simultaneously and is commonly used in singing videos and reaction videos. Sometimes #duet is written in the caption.

Stitching
Is there stitching involved? Yes, no The video may incorporate a clip of another user's video sequentially, often a snippet at the start of the video, to add a comment or react to it. Sometimes #stitch is written in the caption.
Green screen Is the green screen effect used? Yes, no The green screen effect uses a recorded video and a different background or shape.
Warning Is there a warning or disclaimer present? Yes, no "Yes": Sometimes written: TW (trigger warning) and also includes TikTok's warning bar that appears at the bottom of the video.
iv. Applying the codebook Once the variables and associated definitions are finalised, an online survey will be created and managed in REDCap. Survey questions associated with each variable and response options will follow the generic and specific codebooks, with a new survey form created for each video within one project, identified by the unique ID created at the time of selection. The primary coder will code all videos (n = 250) across the five identified hashtags. The secondary coder will code a random sample of at least 10% of the videos across all five hashtags (n = 25). Agreement between the two coders will be determined, with a Cohen's kappa of >0.8 being acceptable. If this statistic is not initially achieved, differences will be discussed by the investigator team to reach an agreement, and the sample re-coded.

Expected Results
Application of this protocol will yield descriptive data pertaining to the portrayal of one selected topic on TikTok. Descriptive statistics (counts and percentages) will be used to summarise the engagement, content, and sentiment variables in the generic and topic-specific codebooks for each hashtag. Within a topic, variables will be compared using basic statistical analyses to determine whether these variables differ across selected hashtags within a topic. All tests of significance will be determined at an alpha level of 0.05.
Where the protocol is applied to multiple topics, variables in the generic codebook can be compared to determine common elements across popular videos in disparate topics, and to identify points of difference.

Discussion
Findings from the application of this protocol are intended to inform the development of evidence-based public health messaging, including counter advertising for specific topic issues revealed during the investigations. Application of the protocol across health-related topics will allow the comparison of common elements across all topic areas to inform the development of a checklist of the most common marketing features and video characteristics.
The unique marketing potential of the TikTok platform, already recognised by industry, could then be harnessed by public health researchers as a valuable dissemination strategy. Therefore, this work will inform the research community to develop efficacious dissemination of findings and to identify the topic areas that have the greatest need for input to combat misinformation.
This flexible protocol is designed to be replicable across any health-related topic and may have wider application for other topic areas and on other social media platforms with user-generated short-form videos. Additionally, should a time-sensitive topic, challenge, or hashtag become popular (e.g., COVID-19, vaccine hesitancy), the topic-specific codebook can be developed quickly and applied. The protocol is described as a cross-sectional analysis however, given the rapidly changing social media landscape, it could be replicated using the same keywords or hashtags longitudinally to assess changes in video content over time.
A limitation of this protocol is that only the video content and engagement data are subject to content analysis. Analysis of users' comments would provide a rich understanding of the discourse around topics on the platform and is an important direction for future research. It should be noted that although this protocol describes the analysis of only publicly available data, the current Community Guidelines of TikTok prohibit the use of automated scripts to collect data from posted content.

Data Availability Statement:
No new data were created or analysed in this study. Data sharing is not applicable to this article.