Incidental Data: A Survey towards Awareness on Privacy-Compromising Data Incidentally Shared on Social Media

: Sharing information with the public is becoming easier than ever before through the usage of the numerous social media platforms readily available today. Once posted online and released to the public, information is almost impossible to withdraw or delete. More alarmingly, postings may carry sensitive information far beyond what was intended to be released, so-called incidental data, which raises various additional security and privacy concerns. To improve our understanding of the awareness of incidental data, we conducted a survey where we asked 192 students for their opinions on publishing selected postings on social media. We found that up to 21.88% of all participants would publish a posting that contained incidental data that two-thirds of them found privacy-compromising. Our results show that continued efforts are needed to increase our awareness of incidental data posted on social media.


Introduction
The reckless use of social media and other online services can expose insights into personal affairs beyond what was intentionally meant to be shared.The posted data might reveal significantly more information than first thought, particularly when analyzed and combined with auxiliary databases.For instance, it has been shown that algorithms can estimate a person's personality effectively from likes [1][2][3].With regard to the rich multimedia data that are posted in large volumes on social media, one might suspect significant unintended disclosure of personal information amongst the many inconspicuous pictures and videos of family and friends.Such unintended disclosure of personal information can threaten someones privacy.For this reason, this study is of interest to investigate the awareness of a person posting content on social media.In this paper, we study the awareness of unintentionally published data on social media, commonly referred to as incidental data [4].We conduct a survey that implicitly assesses the participant's awareness of incidental data without the influence of the question itself through a survey, as the mere assessment of a privacy concern increases its awareness [5].As a methodology, we used a survey that asked questions regarding postings that contained incidental data.In our previous research [6], we analyzed postings using Open Source Intelligence (OSINT) methods while limited to two hours per target.It was possible to detect data that were not intended to be published within that time limit.The responses from the quantitative survey method are then analyzed using statistical methods.Further, data found within those postings lead to further data that most found privacy-compromising. Our survey provides robust evidence that there exists a significant lack of awareness around incidental data in social media postings as more than one-fifth of participants would share content containing hidden data they find privacy-compromising.We have indicators that suggest awareness of hidden data that can be a threat to privacy; thus, the focus on proper guidelines and education to help prevent the self-publication of incidental data may be a point of improvement.This paper makes the following contributions: • Presents a novel survey methodology that indirectly evaluates participants' awareness of incidental data, avoiding the influence of the question itself; • Provides robust empirical evidence of the lack of awareness among social media users regarding the privacy implications of their online content; • Highlights the prevalence of incidental data in social media posts and its potential threat to user privacy.
In the first part of this paper, we discuss the current state of the art and summarize the views on the psychological impact of disclosure or nondisclosure of information.Then, we provide insights on our study design and further list the questions of our survey, including answer choices.Next, we show and describe our results in Section 4 before we evaluate and discuss those results in Section 5. We summarize our study with a final conclusion in Section 6.

Background
Privacy on social media is often discussed in terms of either privacy-jeopardizing settings or malicious actors [7].However, Krämer and Schäwel [8] discuss the urge of people to self-disclose personal information on social media.Schneier [4] defined a taxonomy of different data that is in connection with the usage of social media, namely, the following: service, disclosed, entrusted, incidental, behavioral, and derived data.In particular, Schneier [4] argues that incidental data in this context are data posted by other people, over which one has no control.Definition 1. "Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts.Again, it's basically the same stuff as disclosed data, but the difference is that you don't have control over it, and you didn't create it in the first place."Schneier [4].
In previous work [6], we argue that the term incidental is used in case something unexpected was found that should not be there.For instance, during an X-ray examination meant to assess a potential bone fracture, the discovery of tumorous tissue is termed an incidental finding [9].Considering this more general meaning, we argue that one's unawareness of unintentionally publishing problematic data, alongside the primary reason for publishing content on social media, also leads to the uncontrollability of personal data.Fitting the intent behind the definition by Schneier [4], we propose an extended Definition 2. Definition 2. Incidental data is data that one has no control over, either due to another person disclosing it or the unawareness of its existence within data disclosed by oneself.

Privacy from a Psychological Perspective
As argued by Schlosser [10], self-disclosure can be defined as communicating personal information about oneself to another person that is a close representation of oneself.Whereas self-presentation is defined as controlled and directed information that impacts the impressions of people about oneself Schlosser [10].Barasch [11] discusses intrapersonal as emotions and processes within oneself, whereas interpersonal describes effects on relationships between others and oneself.
Luo and Hancock [12] state that disclosure fulfills basic social needs and thus improves one's well-being.Krämer and Schäwel [8] continue that privacy is an intrapersonal secondary need for people.Equally important is the view of the intrapersonal and interpersonal cost of not disclosing information.Sharing a problem (privileged information) can help to improve one's situation by gaining new views on a personal topic or the view of others about oneself [13].However, the consequences of sharing information are often overestimated [13].Furthermore, the sharing of secrets can be used in a strategic manner to evade criticism and gain support [14].Consequently, it can be said that disclosing information can have positive effects and be vital for oneself.
Social media seems to have filled a perfect spot that can fulfill the human motivation for self-disclosure [8].However, this also entails dangers, as interactions with social media as simple and trivial as giving likes to certain posts can give away personal information [1,15].
Brough and Martin [5] claim that research on privacy is strongly focused on a user's motivation to protect their personal data from unauthorized usage, which correlates to privacy concerns; however, they focus very little of their research on privacy knowledge.The authors further state that privacy concerns might be artificially increased when they are being assessed.
Automated data collection and the usage of specialized algorithms can reveal sensitive information about one's life [16,17].Fast and Jago [16] find that people underestimate the risks of sharing personal data; moreover, people seem unable to take strong actions even after severe privacy violations [18].Such behavior comes from focusing on benefits and convenience combined with not being an explicitly identifiable victim [19].Conversely, the benefit and convenience of data collection and usage of algorithms pose a massive threat to one's privacy; however, this may have created a state of mind where people think it is not realistically possible to stop it.

Privacy from a Technical Perspective
When people interact with a social network of their choice, it can be assumed that the main goal is to share content and not to tackle a host of privacy settings.This can be problematic as companies have discovered that user data, especially of a large group of people, can be a valuable asset.Even though there are good examples of user-based privacy, there are companies that take advantage of people's behavior.As discussed by Bösch et al. [20], such methods are referred to as dark privacy strategies or dark privacy patterns.Research in human-computer interaction and user-experience design has found that people are more likely to press a button in a rush if it is green.This led to situations where companies made the accept button for "allow cookies" or "share statistical data" buttons slightly larger and green, whereas the decline button is slightly smaller and gray [21,22].
Al-Charchafchi et al. [23] found in their review that users are threatened in multiple ways.The threat vectors concern information privacy, social engineering, data leakages through unfit privacy settings, or Application Programming Interface (API) weaknesses.A similar line is taken by the work of Johansen et al. [24], with the authors providing an insight into the problems and opportunities of lifelogging systems.In forensics and in court, the analysis of Electric Network Frequency (ENF) becomes used more often in order to verify timestamps or the untampered integrity of audio and video recordings [25][26][27].

Privacy from an Awareness Perspective
The quantitative study of Amon et al. [28] on interdependent privacy provides valuable insights into aspects of privacy awareness, especially the sharing of private information of other persons.The study analyzed 245 responses on 68 real-world pictures out of 13 categories through a questionnaire about the likelihood of sharing given pictures, entertainment, and its privacy rating.The study assessed the specific personality traits known as the dark triad, which focuses on narcissism, psychopathy, and manipulative personality style.Even though the study gives valuable insights into privacy awareness on social media, it focuses on pictures shared by others.Based on the responses on the pictures and personality traits, the findings from a cluster analysis were the following three interdependent privacy user categories: privacy preservers, privacy ignorers, and privacy violators.The study reveals that privacy ignorers have a low dark triad and low levels of education level but prefer personal privacy.Privacy violators have a high dark triad, high levels of education, and further prefer openness as a key motivation factor for sharing potentially sensitive pictures of other persons.
Padyab et al. [29] conducted two sub-studies regarding privacy awareness on social media based on exploratory focus groups.The first tackles dedicated algorithms on social media; the second explores self-disclosure.These studies show that users were generally unaware of the extent published data can be used to extract private information.Further, it was shown that a user's awareness could be raised by letting them use an extraction tool on their own social media profile.

Implementation
We conducted the following four separate surveys: IDS2301, IDS2301U, IDS2302, and IDS2302U.It is important to mention that the presented study relies only on one survey, namely, IDS2301.Nonetheless, surveys IDS2301U, IDS2302, and IDS2302U were implemented in order to obtain reliable results, as explained in Section 5.4.An overview of each survey, including the motivation, can be found in Table A1.In accordance with the General Data Protection Regulation (EU) 2016/679 (GDPR), an online survey provider was chosen in order to conduct the survey and collect responses.For the social media postings, we decided to use two well-analyzed postings from our previous work [6], while limited to two hours per target, Kutschera [6] found that privacy-compromising data unintentionally posted could be found by using OSINT.For the IDS2301 survey, we ran a test phase where 12 participants were asked to give feedback on the consistency, subjective understandability, and potential typos.The feedback was used to improve the survey.At the end of the survey, a link to the dedicated follow-up survey was presented.To motivate participation, students were offered bonus points counting toward their final grade for completing the survey.To claim points, a student had to submit a selfgenerated random Universal Unique Identifier (UUID) token as part of the follow-up survey and subsequently to the university's e-learning platform.Submitting the token as part of a separate follow-up study instead of the main study, allowed us to correctly identify students for the purpose of crediting points while at the same time preserving their anonymity in main survey.Section 5.4 discusses this aspect in more detail.The instructions on how to claim points were only revealed at the end of the main survey, which reduced the risk of students going directly to the follow-up study to claim credits, skipping the main survey.It was certainly possible to receive these instructions out-of-band, circumvent the protection mechanism, and claim points without accessing the main survey.However, this was by design, as we found it preferable over the case where students would fast-click through the survey and submit bogus data.Our scheme gives no incentive to complete the survey multiple times, as bonus points will only be received once.

Assessment Design
In our survey, we judiciously employed 5-point and 6-point Likert scales [30] for distinct sets of questions driven by the nature of the responses we sought to capture.For 23 of our questions, we utilized the 5-point Likert scale, acknowledging its capacity to provide a balanced range of options from "Strongly Disagree" to "Strongly Agree," along with a neutral midpoint.Including a neutral option in these cases allows for a more accurate representation of respondent attitudes, particularly when they may lack a definitive opinion or possess moderate views [31].This configuration was especially suitable for questions Q1 and Q2, as depicted in Table 1, and Q3.1 to Q3.21, as seen in Table 2, where neutrality or a middle-ground perspective was a plausible and informative response.Conversely, for 14 of our sub-questions, we chose the 6-point Likert scale.By compelling respondents to lean towards agreement or disagreement, the 6-point scale aids in delineating clearer, more decisive insights into specific attitudes or opinions, which is particularly valuable in areas where a neutral stance is less informative or relevant to our research objectives [32].The absence of a neutral midpoint in the 6-point scale is instrumental in scenarios where decisiveness in responses is critical, or neutrality could result in ambiguous data interpretation [31,33].We intended to compel respondents to take a definitive stance on Q7.1-Q7.14, as shown in Table A4, thereby eliminating the central tendency bias, where participants might gravitate towards a neutral choice.

Questions
Our main survey consisted of 14 main questions and 72 sub-questions.For the initial questions Q1 and Q2, as seen in Table 1, participants were presented with two example scenarios, one for each question.Each scenario was made up of three pictures, and we asked if they would share the content publicly if it was theirs.The questions consisted of a 5-P Likert scale entry in combination with an open-ended sub-question.Because such reflective questions can be influenced by later questions in the survey [5], we ask these questions first.
For Q1, participants were presented with Example 1 consisting of three images from a video where someone shows the surroundings of a rural area that can be assumed to be their home, as seen in Figure 2. The video title: Wild Oklahoma Weather, indicates that the video is about an upcoming severe storm.The participants were asked to consider if they would share the video if the depicted house was theirs.For Q2, participants were presented with Example 2, consisting of three social media postings, as seen in Figure 3, and we ask them similarly if they would share the images publicly if they depicted their own property and surroundings.Question Q3, as seen in Table 2, asks the participants about their privacy perceptions on the various data types that are detectable from the social media postings detectable in Example 2 according to the OSINT analysis method proposed by Kutschera [6].Several other key sensitive data types, like date of birth (Q3.3), blood type (Q3.6), and social security number (Q3.7), are also included.
Questions Q4 and Q5, as seen in Table 3, ask participants about their perception of various privacy guidelines they practice currently (Q4) and in the future (Q5).The questions Q4 and Q5 differ only in their usage, namely, Q4-current and Q5-future.As the guidelines are the same, they are best represented in joined Table 3.The purpose of Q5 is to see to what extent participants had their perceptions influenced by participating in this survey.
Questions Q6 and Q7 are about social media usage (Tables A3 and A4).Questions Q8-Q14, which are about demographic values, as seen in Table A5, were implemented.We use these demographic data to organize respondents into various sub-group filters.The abbreviations on the filters used for each subgroup are listed and explained in Table 4.Besides these active responses, the survey provider also collected the start and end timestamps for each survey.These start and end times allow us to calculate the time spent on the survey.

Results
Table 5 shows the percentage of participants who responded with either agree or strongly agree on questions about the privacy compromise for the various data types listed in questions Q3.1-Q3.20.The percentage of those that disagreed or strongly disagreed is shown in Table A6.The background color in both tables is graded from green to red through yellow based on the cell value.Within both tables, the data type that can be found is shown more visually within rows E.1 and E.2, respectively.Further, the data types correlate to questions Q3.1-Q3.20.
The boxplot in Figure 4 depicts the statistical properties of the responses, with the median at the tapered point with an orange line and supports the results presented in Table 5.The adjoining areas indicate the 25% above and below the median, the whiskers indicate the first and fourth quartile of responses, and outliers are indicated by a circle.Column N states the number of participants in each row.Column Abbr.lists all subgroups using their filter identifier as explained in Table 4.The cells represent the percentage of participants of the said subgroup who responded with either agree or strongly agree on questions about the privacy compromise regarding data types Q3.1-Q3.20.Based on the cell value, the background color is graded from green to red through yellow.
The percentage of positive answers to Q4 towards the current usage of awareness guidelines are shown in Table 6, while the percentage of positive answers to Q5 towards the future usage of awareness guidelines are shown in Table 7.The same filter groups are used as for Q3 in Table 5.Based on the cell value, the background color is graded from green to red through yellow.
In Q6 and Q7 of our survey, participants were asked to optionally answer questions about their social media usage, what social media platforms they use, and to what extent on a 6P-Likert Scale with the following options: no answer (0), never (1), very rarely (2), rarely (3), occasionally (4), frequently (5), and very frequently (6), with 0 as the default value.The results are visualized in Table A2, whereas the questions are listed in Tables A3 and A4.The background color in Table A2 is determined by the value of the cell from green to red through yellow.The privacy awareness guidelines proposed by Kutschera [6] are enumerated in Table 3. Table 8 shows how each guideline can prevent the exposure of a certain data type.For instance, enforcing guideline Q4.4 will help minimize the exposure risk of current state or country (Q3.1), date of birth (Q3.3), security measures against burglars (Q3.19), and absence of security measures against burglars (Q3.20).Naturally, OSINT has manifold ways of detecting data types, and some can be obtained by gaining knowledge of another data type first.For example, price of property (Q3.13) or date of property purchase (Q3.14) may become evident through the detection of full address (Q3.4).Those data types are listed in the Indirect column of Table 8.

Evaluation and Interpretation of Survey Results
Our study aims to detect privacy awareness on social media implicitly.Alongside the methodology used, this study never asked or measured direct awareness about incidental data as a direct question, as this might have influenced the participant and thus rendered this study invalid.Moreover, we used well-analyzed postings from our previous research, of which we knew exactly what data types could be discovered in a strict time frame of up to two hours.In the first step, the participants had to answer whether they would have posted the content shown, see Table 1.In the second step, the participants had to answer which data type would compromise their privacy if shared, see Table 2.By combining the results of both questions and the data types found in each example, we gained implicit knowledge about whether the participants would have shared a certain data type and also had concerns about this data type, as shown in Table 5. Below, we split the evaluation into topical sections to evaluate and interpret the present survey results from this study.

Implicit Incidental Data Awareness
Upon taking a closer look, it becomes evident that in certain cases, a supermajority of people concerned about their privacy with regard to a specific data type are willing to share content that can be used to reveal those specific data types.
Data type parcel number (Q3.12) was included, but it did only reach a single majority of 62.96% (E1P) and 64.29% (E2P), respectively.The discussed details are visible in Table 9, which is an excerpt of Table 5.
In summary, the participants of E1P and E2P are concerned about data types Q3.2, Q3.4,Q3.8, and Q3.20, but are also very likely to share a post containing those data types.This allows us to draw an implicit conclusion that these individuals are unaware of incidental data contained in certain postings.Together these results provide important insights into the awareness of sharing privacy concerning incidental data.
Furthermore, interesting is that 93.75% of those in subgroup LOB are concerned about the car license number (Q3.15), but only 76.04% of the overall group ALL and 66.28% of subgroup LRS, respectively, are concerned about the same data type.The reason for this could either be that people who lived on their own property are more aware of what can be revealed or what harm can be performed through a car license number, or the meaning of car license number was misunderstood for something other than a license plate.
As for data type full name of previous owners (Q3.5), 62.5% of those in the LOB and LOS groups are concerned, whereas in the ALL group 47.92% are concerned.Even significantly lower is the concern in subgroup ALP with 25.0%, and 40.7% for the LRS subgroup.An indication of a decrease in concern could be that property owners are more aware of potential risks that the name of previous owners can pose in comparison with people who live in rented accommodation.
Another subgroup of interest is LRB where 83.67% are concerned about Q3.19 (security measures against burglars) but only 62.5% of LOB are concerned whereas in the overall group ALL 71.88% are concerned.A possible explanation for this is that people who live on their own property have full power over installation and can also choose on their own to implement concealed and potentially strong measures against burglars, whereas people who live in rented accommodation need the approval of the landlord and will not get compensated in case they move to different housing.This reasoning could lead to a decision for a cheaper movable, and thus non-concealed measures against burglars.

Usage of Guidelines
From Table 6, we observe that a two-thirds supermajority of participants (i.e., subgroup ALL) currently use guidelines Q4.1, Q4.6, Q4.7, and Q4.8, whereas Q4.5, Q4.10, Q4.11, and Q4.12 are currently only used by one-third of the same group.In contrast, the answers regarding future usage of the guidelines, Table 7 shows the highest response on Q5.2 (Avoiding reflections on surfaces and mirrors), and the least response with regard to future usage on Q5.10 (Close curtains or avoid windows).The most feedback received was on Q5.0 (None other than those before) with 42.71%.Merely asking questions about privacy concerns is influencing participants [5].These results and the overall low rate of response on future usage of the mentioned guidelines suggest that the survey design did not greatly influence the participants.
From subgroup ALP's responses on Q4.9-Q4.11and Q5.9-Q5.11,as seen in Table 3, we see that few countermeasures are in place today and that this situation will likely be the same in the future.This is interesting as Table 5 shows that 81.25% of the same subgroup ALP "Agree" and "Strongly Agree" that data type full address (Q3.4) can compromise their privacy.Moreover, it can be assumed that full address (Q3.4) can be discovered very quickly when map material (Q4.9 and Q5.9) is included in a post, which the subgroup ALP would publish as per the definition in Table 4.
As discussed in Section 5.1.1,subgroups E1P and E2P are willing to post data that a majority of the group has privacy concerns about.Furthermore, the usage of the guidelines in Table 8 reveals that the measures stated in the guidelines, which may well have prevented the publication of incidental data, are not used.For example, in order to avoid data type Q3.4 incidental data, one can focus on guidelines Q4.1, Q4.2, Q4.5, Q4.6, and Q4.8-13.Only two to three guidelines are used by a supermajority in groups E1P and E2P, as shown in Table 6.

Similar Studies
There is a notable gap in the existing literature regarding the implicit analysis of individuals' awareness when sharing postings on social media potentially containing incidental data.To the best of our knowledge, there exists no study we can compare with, while not directly addressing this specific aspect, the research conducted by Padyab et al. [29] and Amon et al. [28] are interesting to approximate with.The study by Amon et al. [28] focuses on the psychological motivations behind users posting pictures of others, whereas the research by Padyab et al. [29] confronts participants with data extraction tools on their own social media profiles.

Statistical Significance
This study uses a confidence interval of 95%.The confidence interval reflects an estimated range of values.Furthermore, the confidence interval indicates the accuracy of the estimate.The margin of error is also used for statistical evaluation.In our study, the margin of error is 7.07%.This indicates the accuracy of the estimate in relation to an entire group.Altogether, the results of the study of 192 students reflect the opinions and awareness of Austrian students.Equation (1) represents the formula used for the margin of error E with Standard Error of the Proportion (SEP) and Finite Population Correction (FPC).
Here, E is the margin of error, Z is the Z-score associated with the desired confidence level, p as the estimated proportion of the population is set to 50%, n is the sample size, and N is the population size.The Z-score was set at 1.96.The target population are Austrian students, with a population of 288,381 as of February 2022 [38,39].The population size is negligible because the FPC is 0.9999.

Trustability of Survey Results
All responses were received from students who received bonus points towards their grades.Due to data protection and ethics, the survey was designed to be 100% anonymous.Within the course, students had to enter one or multiple UUID tokens into the university submission system in order to receive offered bonus points.This token also had to be entered into the token collection surveys IDS2301U or IDS2302U.As a student, it was not allowed to have multiple tokens in the IDS2301U survey.However, IDS2302U received multiple entries since this was also the token submission survey for students who attended three courses.Further, we are able to analyze the data of the token collection survey IDS2301U, IDS2302U, and the university submission system.
In order to understand why we are highly certain students did not take a survey twice, we need to describe the process in more detail.The survey and bonus point granting workflow is also visualized within Figure 5. Emails with the link to the survey IDS2301 were sent out alongside with emails to students who attended one or more classes.Students had to go through the survey to find the link for the token collection survey IDS2301U.Students had to generate the UUID token by themself and enter it into IDS2301U.Since IDS2301U was a two-question survey (email for updates and token) it would have been easier for students to ask their peers and simply enter their UUID, hence claiming that they had done the survey (IDS2301) rather than actually going through the survey (IDS2301).An analysis of the university data and the token collection survey showed that zero students claimed bonus points for multiple courses in IDS2301U.It is important to mention that not all enrolled students were graded and that not all graded students needed bonus points since the link to the survey was sent out close to the end of the course.Hence, students could estimate if an excellent grade was already reached or not, thus rendering bonus points useless.

Conclusions
Disclosure of private information is crucial to social interactions, yet the awareness of privacy-compromising data hidden within in a self-disclosed posting needs more attention.This study extends the previous study of Kutschera [6] and using the comprehensively analyzed postings as seen in Figures 2 and 3.The implicit design of survey questions allowed us to gain inside information on the awareness of people about private data.Furthermore, our study shows that awareness about incidental data is very low, and this constitutes a privacy and security concern.Our survey shows clear privacy concerns on data types full name (Q3.2), full address (Q3.4), information on relatives (Q3.8), phone number (Q3.9), parcel number of the property (Q3.12), security measures against burglars (Q3.19), as well as the absence thereof (Q3.20).Though participants were not forced to a decision by responding neutral on the 5-P Likert scale, 14.06% to 21.88% responded that, despite their privacy concerns, they were surprisingly willing to publish a posting, knowingly or not, that contains information considered privacy-compromising, thus incidental data.
Even though our survey achieved a confidence interval of 95%, the margin of error of 7.07% is still above the standard of 5% with 192 responses.Further, the results of our survey are limited with regard to interpretation, as the survey only asked Austrian students.With that in mind, we recommend a more widespread survey on the privacy and security issues of incidental data.Policymakers should also be made aware of these issues so that they can implement guidelines or other mechanisms that are latent to either raise awareness among the general public or alert persons before posting potentially harmful postings.
the study design, the survey shall only be taken once**.The survey is anonymous and single responses cannot be traced back.The collected data will be analyzed, and results may be published in a journal or at a conference."IDS2301": https://. . .Your feedback is valuable to our research, and we want to reward your time and effort.In order to gain the extra 10%, you need to drop a token into TeachCenter.This token is a randomly generated UUID and submitted to us in two ways.(This token concept won't affect the anonymity).**For those attending two or more lectures: You will get an additional email with information on how to gain 10% for each of your courses.Hint: Follow-up surveys.Thank you in advance for your participation and valuable feedback.

Kind Regards, Stefan
Appendix A.2. Email to Students Who Are Enrolled in Multiple Courses Dear students, you receive this email because you are enrolled in more than 1 course (ASD/ST or MA).Wwe're excited to finally present you the opportunity to earn 10% bonus points for another course with us.Our research will investigate privacy on social media through a survey.As per the study design, the survey shall only be taken once.The survey is anonymous and single responses cannot be traced back.The collected data will be analyzed, and results may be published in a journal or at a conference.It is important for the study design that the first survey with the ID "IDS2301" is filled out first (previous email), as they rely on each other."IDS2302": https://. . .Your feedback is valuable to our research, and we want to reward your time and effort.In order to gain the extra 10%, you need to drop a token into TeachCenter.This token is a randomly generated UUID and submitted to us in two ways.(This token concept won't affect the anonymity).
Thank you in advance for your participation and valuable feedback.

Kind Regards, Stefan
Table A1.Overview of surveys used in this study.

IDS2301
Main survey with 14 questions.First presentation of pictures to students.

IDS2301U
In case IDS2301 was answered: update survey for bonus point token and optionally the possibility to stay informed about this study.

IDS2302
In case students needed to earn bonus points for more than one lecture, another survey that has open-ended questions regarding the pictures already seen from IDS202301.This survey has the sole purpose of preventing students from multi-answering IDS202301 and, therefore, not influencing the results as a whole.

IDS2302U
In case IDS2302 was answered: update survey for bonus point token and optionally the possibility to stay informed about this study.

Appendix B. IDS2301 Survey Questions & Results
Table A2.Shows the different social media platforms usages and the respective behavior of uploading content (UC) and consuming content (CC).Cell background is based on the value of the cell.Column N states the number of participants in each row.Column Abbr.lists all subgroups using their filter identifier as explained in Table 4.The cells represent the percentage of participants of the said subgroup who responded with either disagree or strongly disagree on questions about the privacy compromise regarding data types Q3.1-Q3.20.

3. 1 .
Recruitment Participants were recruited from three different courses at Graz University of Technology, Austria: INH.04062UF Agile Software Development (170 students), INP.32600UF Mobile Applications (45 students), and INP.33172UF Software Technology (84 students).Of all 299 students, 198 optionally participated in survey IDS2301, as shown in Figure 1.Students who were signed up for two or more lectures were only allowed to take the IDS2301 survey once.

Figure 1 .
Figure 1.Venn-diagram of the number of enrolled students in the three courses we recruited participants from and how they overlap (left); and the number of responses before and after cleaning (right).

Figure 2 .
Figure 2. Shows the pictures presented to participants in first question Q1 and Q1.1.Subfigures (a-c) show different scenes from the video.(a,b) combined hints the shape of the backyard, whereas (c) depicts a smartphone with a weather app showing an incoming storm and the current position as a blue dot.The pictures were taken from Kutschera [6] and [34], respectively.

Figure 3 .
Figure 3. Shows the pictures presented to participants in second question Q2 and Q2.1.Each subfigure (a-c) represents different postings from the same person on Twitter.The posting shown in (a) is a response to a question if the car is still owned, (b)an untriggered comment about how beautiful the day is, (c) depicts a posting that comments on the end of the day where the skyline and a small pool is visible alongside the moon.The pictures were taken fromKutschera [6]  and Twitter[35][36][37], respectively.

Figure 4 .
Figure 4. Depicts the boxplot visualizing the statistical values, such as median and the quantiles of all answers, from Q3.1 to Q3.20.

Figure 5 .
Figure 5. Depicts the workflow a student must undertake in order to receive bonus points, how anonymity is preserved, and how data are kept clean and trustworthy.Blue indicates a student action, whereas green indicates a lecturer or researcher role.Students who enrolled to two or more courses received, alongside the IDS2301 survey link, the information and link to the secondary IDS2302 survey.The IDS2302 survey shows the same postings but asks what data and information they find in an open question.At the end a link to the dedicated token collection survey IDS2302U was revealed.
Any other piece of data that may compromise your privacy if shared incidentally?(optional) 5-P Likert Scale

Table 3 .
Survey IDS2301 Questions 4; current guideline usage and Questions 5; future guideline usage.The numbering of awareness guidelines is aligned through Q4.n and Q5.n.
Q4What measures against the accidental posting of private information do you currently use actively when posting on social media Q5 What measures against the accidental posting of private information will you additionally use in the future?(optional) Be on the lookout for reflections in mirrors as well as on surfaces such as cars, windows, vitrines, glasses, sunglasses, or watches.Boolean Q5.3 Q4.3 Post content of vacations-if at all-only after the vacation has ended.Boolean Q5.4 Q4.4 Avoid repetition of vacations or periods of absence, such as "during New Years Eve I am-alwayson a one-week trip".

Table 4 .
Lists the abbreviations and their filtered participants of the analysed subsets of the survey.The letters of the abbreviation's origin are underlined within the description.
Row E.1 withinTable 5 marks the data types extractable from Example 1, as shown in Figure

Table 6 .
Shows the percentage of positive answers towards current usage of guidelines from Q4.Based on the cell value, the background color is graded from green to red through yellow.

Table 7 .
Shows the percentage of positive answers towards future usage of guidelines from Q5.

Table 9 .
Shows excerpt of data types from Question 3 (Q3.n)(Table5).It displays the presence (X) or absence (O) of specific data types in 'Example 1' (E.1) and 'Example 2' (E.2).The 'Abbr.' column lists subgroups using filter identifiers (see Table4).Cells indicate the percentage of subgroup participants agreeing or strongly agreeing to publish the posting.Cell highlighting denotes extractable data types with a simple majority in yellow and a supermajority in orange.