Ethical Applications of Big Data-Driven AI on Social Systems: Literature Analysis and Example Deployment Use Case

: The use of technological solutions to address the production of goods and offering of services is ubiquitous. Health and social issues, however, have only slowly been permeated by technological solutions. Whilst several advances have been made in health in recent years, the adoption of technology to combat social problems has lagged behind. In this paper, we explore Big Data-driven Artiﬁcial Intelligence (AI) applied to social systems; i


Introduction
The Internet was supposed to ensure no bad deed goes unnoticed [1]. Instead, social media and the metastasis of "fake news" [2] have made misinformation more widespread than ever. Blockchain technology was supposed to provide a stable distributed currency [3], beyond the control of a central entity. Instead, Bitcoin is a volatile commodity [4] often associated with black-market, "dark web" transactions. By itself, development and deployment of technology does not address social problems [5]; rather, each technological improvement creates as many new problems as it does solutions [6]. Technology per se is orthogonal to its application in a complex social fabric: technology is not a panacea [7]. Addressing specific societal problems requires focused and thoughtful deployment of technological solutions, in close collaboration with the intended user population [8]. For example, the development of user-friendly, affordable home medical diagnostics systems [9] can alleviate the influx of low-risk chronic patients to the medical system [10], improving the patient's quality of life while increasing available bandwidth to higher-need patients. Communicating autonomous cars with advanced sensing technology, capable of group intelligence, can decrease road mortality rates and reduce traffic congestions [11].
These examples highlight technological deployment to improve established social mechanisms (e.g., health care and transportation [12]). However, we argue that it is possible (and attractive) to use technology for creating and enabling new social mechanisms [13]; i.e., leverage technology to enable novel social solutions. For example, eCommerce [14] has enabled new markets, empowered individual entrepreneurs, and disrupted established socioeconomic systems. AirBnB, for example, enabled individuals to overcome the advertising and transaction elements that were stopping them from sharing their home [15].
In this manuscript, we present our ongoing work in this direction: specifically, on social computing for combating youth homelessness and senior loneliness in a home-share setting. Whilst our research is still in the preliminary stages, our methodological design, accounting for social/health/legal aspects of social computing deployment, will ignite some interesting scholarly debate in this field.
We begin by providing a critical analysis of social computing literature, highlighting the dangers posed by Big Data and general limitations of social computing in its current form. We then review the social fabric that propelled our work (the loneliness epidemic and youth homelessness), followed by our ongoing use-case, describing our methodology for ethically-aligned design, and identify the potential risks and benefits of such social computing research. Specifically, this manuscript offers the following contributions: • A critical analysis of the state of the art in social computing. A review of social aspects that propel our use case (homelessness and loneliness).

•
A methodology for ethically-aligned social computing research.
The remainder of this manuscript is organized as follows: Section 2 reviews the state of the art for social computing, the fallacies in Big Data use, and the limitations of the current state of social computing. Section 3 describes the motivating problems for our use-case, a use-case description, and an analysis of identified risks and prospective rewards. Finally, Section 4 offers our concluding remarks and suggestions for future work.

Social Computing: State of the Art
Social computing refers to the application of technological solutions to profound social problems [16]; an umbrella term meant to incorporate any activity, from small to large, that endeavors to convey and reinforce computing's social relevance and potential for positive societal impact [17]. Social computing is an extension of affective computing (computing that relates to, arises from, or deliberately influences emotions [18]) to incorporate social space as a logical architecture [19]; i.e., an integration of social attributes and social intra-/inter-relationships by human beings and other physical objects or cyber entities. Over the last few decades, we have seen the rapid ascent of computing technology for health behavior change and well-being [20] (persuasive technology [21], positive computing [22] and affective computing [23]) and an increase in policymakers and health care providers' interest in looking for interventions that motivate positive health behavior change, particularly interventions leveraging the capabilities of computing technology. This trend is expanding to encompass social aspects [24], promoting research and design projects that examine or intervene in large scale social issues [25], making use of social relationships among end users for the construction of cyber-social solutions [26]. As human factors are involved, this direction requires scholars to engage directly in (or against) political and social stances, often addressing historical inclinations towards treating technological development as unquestionably progressive.
Social computing can be deployed at several levels of social strata: perhaps one of the most popular nowadays is Internet of Things (IoT)-powered autonomous vehicles [26], where vehicles perform as sensor hubs to capture information by in-vehicle or smartphone sensors, obtain individuals' social relationships through network analysis, and extend the concept of social relationships among vehicles to promote the effectiveness and efficiency of service [26]; i.e, implementing a form of situational awareness (SA) [19] to address cluttered social data into actionable knowledge by pattern recognition, interpretation, and evaluation. eHealth, another branch of IoT, identifies technology as a particularly useful tool for supporting and managing health conditions at home. A specific example is smart home technology and wearable devices to monitor individuals living with Alzheimer's [27]. Such technologies support individuals to stay in their homes longer and provides reassurance to caregivers, who can monitor the health and safety of family members or patients. Resistance to such living arrangements are often fraught with concerns about safety, for all parties. In-home monitoring has been recognized as strategy to support individuals to live independently and safely for longer durations of time [28]. However, the fundamental challenge in implementing new technologies in personal home settings are concerns of privacy. As researchers revealed with a longitudinal study on in-home sensor technologies, 72% of older adult participants reported acceptance of in-home monitoring and were willing to share data with doctors or family members [29]. However, 60% of participants had concerns related to privacy and security-reinforcing the importance of thoughtful deployment and implementation of technologies within a user's home. Maintaining independence superseded issues of privacy and security in numerous studies [30][31][32]. Leveraging data in meaningful ways can lead to greater individual safety, security, and independence.

The "Big Data" Fallacy
Google now has access to the medical data of millions of Americans [33]. Ethical and moral implications aside, the (likely) goal is to sieve that data through machine learning (ML) systems to derive new knowledge about health and medicine: just another day in the age of big data [34], when industry and academia alike are finding novel uses for the abundance of available data, primarily through the use of state of the art ML. The caveat: you should not trust the results at all.
Reading the previous paragraph, readers might infer we are skeptical of ML. We are not. ML is an incredibly powerful technology: massively oversimplifying, ML is, at its heart, a statistical analysis tool. Based on training data (e.g., pictures with and without cats in them), ML can derive statistical information that allows it to make decisions about new, previously unseen inputs. ML is an ideal tool for all sorts of recommendation systems (e.g., Amazon) and as a data analysis tool in general.
Why the skepticism about analysis of medical data then? Because the data are flawed [35]. Hence, regardless of how good the data processing tool is, results will be flawed. We are not implying that the data are inaccurate: in all likelihood, all medical data are correct. The key point here is that they were not obtained through proper scientific processes (controlled experiments), and that makes statistical analysis useless.
Let us illustrate. Examples of good data gathering include political polls [36] and medical testing. In polls, care is taken to ensure polled voters are geographically and demographically (age, gender, ethnicity...) distributed, so they make up a representative cross-section of the country. In medical testing (e.g., is a new drug effective against the flu), care is taken to ensure that external factors that may influence the result are accounted for: typically, there is one test group (using the new drug), one control group (not using anything at all), and one placebo group (not using anything at all, but believing they are being given the new drug). Both of these examples demonstrate carefully constructed data gathering processes, to ensure that data analysis (ML or otherwise) results in trustworthy correlations (relationships between different properties).
However, big data, including the medical data Google now has access to, is not like this. It is data "out in the wild", and that makes it full of biases that compromise results. This phenomenon is uniquely illustrated by Berkson's paradox [37]): a study by Berkson [38] found that, looking only at hospital data, one can find many (completely false) inverse correlations between different diseases (showing that, for example, you are less likely to have gallbladder inflammation if you have diabetes). If you look at properly sampled data from the population at large, those correlations disappear. The reason for the paradox is this: sampling from hospital data, a patient without diabetes is more likely to have gallbladder inflammation than a random individual, since the patient must have had some non-diabetes (possibly gallbladder inflammation) reason to be in the hospital in the first place. We are extremely skeptical, not of ML, but of the data being used to train it. We are seeing this trend in industry and academia alike, and it is worrisome. People believe that more data will result in better results. Yet we have seen it time and time again: we know that wild data is racist, sexist, etc. [39]. All sorts of spurious correlations can be found in big data.

Limitations and Bottlenecks
Beyond the technological requirements [19], social computing confronts another challenging issue: it must evolve towards a multi-stakeholder ecosystem with more human participation, encompassing issues such as ownership, control management, affiliation relationship modeling, trustworthiness evaluation, human behavior formalization, educational training, behavioral conventions, legal regulations, social administration, public services, and economic supervision for the social participants [19], social group formation, trust management, and evaluation [26]. Projects that utilize users' information like location, identity, mobility patterns, and social connections to provide services [26] are often framed as explicit attempts to leverage the capacity of technology for social change in the service of addressing what have been identified as underlying social challenges such as widespread poverty or a lack of access to education [25]. However, projects focused on predominately technical approaches to social change often fail because they lack social and political sustainability, underscoring a need for technologists to engage more explicitly with precisely the key concerns of social justice-the multiplicity of stakeholders, power relations, and the unevenness of social and political systems [25], particularly in regards to communities that, by their social and material conditions, are vulnerable, such as homeless communities [40]. In such communities, the active avoidance of harms and plans for their mitigation is particularly important, especially when considering (lack of) access to technology [20] and adoption inertia.
Research has shown that retro-fitting existing systems and enhancing new ones with added technological layers may be faster, most cost-benefit efficient, and more scalable than the development of novel systems. For example, leveraging activity trackers and mobile phones [20], since affect (i.e., mood, emotion, feelings) largely determines individual perceptions, cognitions, and behaviors towards technology [41], and is positively influenced by the perception of "anytime-ness", i.e., experiencing the psychological readiness generated by having access to information at any time [42]. However, discussions on privacy vs. independence, particularly as we move into people's homes, is the single biggest barrier to adopting strategies, such as on-premise video analysis, such that video itself never leaves the home, only summary indicators/analytics, can alleviate such concerns.

Motivating Problem 1: The Loneliness Epidemic
Loneliness is a commonly cited condition in older adults and is correlated with increased morbidity and mortality [43], decreased cognitive functions, and poor sleep quality [44]. Loneliness is caused by a perceived deficit of social relationships [43]. Technology and loneliness have a complex relationship, such that researchers are finding conflicting evidence around technology and the positive and/or negative impact on loneliness. Scholars have noted that we are currently experiencing a "loneliness epidemic" [45]. Some researchers suggest technology and access to social media isolate people despite the goal of connecting them. Alternatively, other researchers found that technology can reduce loneliness [46]. These conflicting findings suggest that the impact of technology and human relationships to it, can be extremely complicated and requires purposeful and strategic rationale and implementation. This epidemic of loneliness is already quantified [47]: social isolation is a significant emerging issue for seniors in Canada [48] and negatively impacts their health [49]. One of the risk factors for social isolation is living alone [50]. Over 25% of seniors in Canada live alone [51], 40% of which face housing affordability issues. A 2019 federal government report found that, "addressing the need for affordable housing that offers varying levels of support is one of the most pressing challenges facing governments today" (p. 18) [52]. Moreover, 20% of Canada's homeless population is 13-24 years old [53], with lesbian, gay, bisexual, transgender, queer, and other persons not included in the previous categories (LGBTQ+) making up 25-40% of youth experiencing homelessness; as such, this has also been identified as one of Canada's most urgent youth equity issues.

Motivating Problem 2: Youth Homelessness
Technological solutions have also been utilized to address homelessness. In Canada, youth homelessness responses have centered around providing emergency services and supports through shelters and day programs [54]. Interestingly, technology has been utilized to address youth homelessness in a number of previous studies, from ensuring accurate homeless counts, to creating technical triage tools to assess individual vulnerability [55]. There have been algorithms designed to support individuals in finding appropriate housing resources [56]. However, as Gary Blasi, a professor of law at UCLA, "Homelessness is not a systems engineering problem. It is a carpentry problem" [56]. Such a statement draws attention to the fact that there are simply not enough housing options for homeless or precariously housed individuals. Indeed, Gaetz et al. [54] argue the importance of rapid and sustainable exits from youth homelessness towards housing stabilization. Rather than build new homes, we propose a novel intervention that capitalizes on unused space within existing homes owned by older adults [57]. Specifically, we propose to utilize sensor and AI [58] technologies to enable home-sharing between older adults, in need of companionship and assistance, and youth a risk of homelessness. We are here proposing to use technology to overcome the security [59] and inherent vulnerability of the two targeted groups: the elderly [60] and youth at risk of homelessness [61]. Technologies will address the largest concerns currently blocking such a solution: privacy, security, and anomaly detection [62].

Ethical Social Computing: Use Case Description
Our research is now focused on leveraging AI technology to tackle the problems of seniors' social isolation, youth homelessness, and affordable housing shortages. Working across the fields of Computer Engineering, Health and Social Sciences, and Law, and in partnership with existing community organizations, our research advances a model of social inclusion where seniors host precariously housed youth in their homes. We hypothesize this model will enable sustainable homelessness prevention, and decrease social isolation of both youth and seniors. However, the success of this model depends on the assurance of safety and security for both populations.
We recognize monitoring and assessing individual behavior in the home is inherently intrusive. The trade-off for secure social inclusion is, therefore, one's privacy. This research seeks to maximize security while minimizing the impact on privacy by using non-invasive sensing technology. "Non-invasive", in terms of privacy, can be understood from two different perspectives: human and technical. From a human perspective, this means that technology fundamentally does not record key modalities of human activity (e.g., audio or video); whilst metadata or even location data is acceptable for most people [41]. From a technical perspective, there are myriad aspects that apply, including anonymization, physical data storage, cybersecurity concerns, etc., but these are outside the scope of this paper.
We hypothesize that such technology, combined with AI techniques for behavior classification and affect, and connected to an appropriate response mechanism (i.e., 911), can achieve a suitable balance between security and privacy. This research has six key objectives:

1.
Determine the best strategies to pair and support the cohabitation of seniors and youth; 2.
Identify and understand diverse senior and youth perspectives on the use of third-party research technology (including, for example, ethnic-cultural minorities and LGBTQ+ seniors); 3.
Determine how best to use technology to support the safety and comfort of both populations; 4.
Assess changes in measures of quality of life and loneliness for both populations; 5.
Identify, classify and suggest recommendations to limit privacy implications; and 6.
Design, develop, and test AI training methodologies and sensing technology that uniquely fit our application domain.
The first phase of this research is thorough stakeholder engagement through key informant interviews and focus groups with the targeted communities. Using a mixed methods research approach, we will collect demographic data and include questions to assess perceived barriers and benefits of technology (including current use, privacy expectations, etc.), loneliness, social cohesion, and quality of life (measures attempt to capture indicators of both personal health status and social well-being), all framed within gender and socio-cultural based analysis.
Next, to test our hypotheses, a controlled experiment will pair volunteer seniors with researchers (RAs) adopting the role of hosted youth. We will deploy a suite of affordable, non-intrusive sensors ( Figure 1) to measure behavior before and after hosted youth move in. This will provide baseline data of typical behavioral patterns (Figure 2), examining the spectrum and modality of captured data and researching sensing technology (e.g., sensors, signal processing, sensor fusion) to best extract analytics from the raw data. Finally, we will use this data to train AI models to automatically detect behaviors (e.g., changes in movement patterns or room occupancy analysis) that are likely to result in unsafe situations. We will also measure technology interaction, perceived intrusiveness, and authenticity of behavior for subsequent technological improvements. Figure 2 illustrates our proposed processing stack. Raw sensor data (which may include audio, ultrasound, pressure, device usage, etc.) is fused into higher-level analytics; for example, speech volume, room occupancy, physical distance, etc., by local processing nodes (implemented using low-cost microcontrollers connected through a local network). This information is gathered in a central processing unit running the ML layer, which essentially corresponds to a multi-modal signal processing system capable of correlating that information with low-level behavioral information (e.g., amount of time participants are in the same room) dictated by the social model (a function of mixed methods analysis, including qualitative participant interviews and computational updates). This low-level behavioral information is then fed into the behavioral analytics layer: a processing system (potentially handcrafted, more likely ML-based) that can correlate that low-level information with high-level human properties, such as mood. This is again informed by the social model and, in turn, updates it dynamically (adapting to participants' evolving behavior).
It is important to note the use of data in this setting. Our primary criticism of Big Data use is about trying to infer correlations or causations from data that have been collected outside of any controlled experiment; this is the reason for spurious correlations to appear (in other words, improper sampling). Our experiment is designed precisely to avoid this problem: by analyzing the same host population pre-and post-move in, we can derive correlations for which we can define some statistical confidence. In a deployment setting, which data are collected is defined beforehand, to ensure that we do not see spurious artifacts arising from improper (context-dependent) sampling.  Whilst several high-risk factors (security, perception, mental health, etc.) challenge this research, our work has the potential to contribute an AI technology-based model of social inclusion, reaping the high rewards promised by AI towards sustainable societies. The following sections illustrate some potential risks and prospective rewards, but let us first quickly discuss the scalability aspects of such an experiment: particularly, why simulation is not suitable in this scenario.
Ideally, such an experiment would be conducted with sufficient participants to ensure that, for any hypothesis, some degree of statistical confidence would be achieved. In reality, this is clearly not possible. One possibility, often used in other scenarios, is to use computer simulations to model behavior, thus scaling up testing. Whilst we agree that this is an attractive possibility, we are skeptical of the capability to model all the nuances of human behavior (including mood and affect) that drive so much of social computing: hence, our focus on embedding mixed methods analysis with real participants for better interpretation, rather than relying on simulation techniques.
Upon successful completion, this research project will deliver insights that will inform policy on social models for sustainable housing, inclusion, and well-being; methodologies for using mixed methods as data labeling for AI training on data and sensor/signal processing technologies for behavior analysis; a framework for the legal considerations regarding these technologies and models; and generated knowledge from diverse seniors and precariously housed youth.

Risk Analysis
Our project proposes a completely new theory and tools for supporting a model of social inclusion where seniors host precariously housed youth in their homes, based on AI-sensing techniques to improve safety and security for both populations. We will not simply deploy AI techniques to an existing scenario. Rather, we will design a comprehensive participatory research approach based on established health and social sciences data collection practices to guide the design and assess the efficiency of our technological approach, proving a solution that is responsibly informed by stakeholders' unique needs. This approach has the potential to radically challenge established theories of social inclusion and housing by exploiting the synergies between computer engineering, health sciences, social sciences, and law. It will explore technology intrusiveness, cost, privacy and security concerns (e.g., authentication, confidentiality), and AI training for high level analytics based on labeled data that can identify affect [63]. The literature reveals seniors are willing to forgo privacy (implementation of technologies in homes) to live independently longer and/or to increase personal safety [44]. However, we do not know if people are willing to voluntarily sacrifice privacy to use technology for safety when opening their home to a stranger. We will carefully coordinate with the appropriate ethics board(s) to address these issues.
Our understanding of interpersonal relationships and how they fit within the context of sustainable homelessness prevention and decreased social isolation will be enhanced by coupling AI techniques with qualitative data gathering through questionnaires and focused interviews. These data analytics will be framed and explored within a legal framework, exposing the intricacies of multiple issues. For example, there is an inherent risk in using technology to manage the safety of vulnerable (seniors and precariously housed youth) populations. This is augmented where there is a lack of understanding or effort to address the underlying reasons for their vulnerability. We also recognize that this technology does not address upstream issues or factors that lead to social isolation and/or precarious housing or other ways in which the two populations are marginalized. There is also a risk that participants will modify their behavior under surveillance, which may lead to inaccurate interpretation of monitored behavior. Additionally, issues may arise when two strangers share space, hindering host recruitment.
From a legal standpoint there are two key sources of risk: (1) collection, retention, and analysis of data raises serious privacy concerns. Subjects' intimate and private home life will be recorded, scrutinized, and coded. Even if the data collected does not provide visual/audio recordings, the study is highly intrusive for all subjects. Great care must be taken to ensure research subjects are fully informed and understand the privacy implications, and strong security measures must be in place to safeguard all collected data. The deployment of the proposed technology raises additional privacy concerns: data collected through its use, and the technology itself, could conceivably be used as a tool for state surveillance. This technology is by no means "single use" and could conceivably be re-purposed for other "undesirable" behavior. (2) Future implementation of this model may lead to the risk of unsafe or manipulative events that arise in participant homes. Pairing strangers together in close quarters, especially when participants may be vulnerable as a result of their age, health, or socioeconomic status, increases the potential for complex situations. This will be mitigated in our research through extensive screening of our RAs and appropriate homeshare pairing (closely monitored by senior research team members).
This approach is at the interface of several disciplines (Engineering, Health Sciences, Social Sciences, Law). We will combine the data gathering methodologies prevalent in the social and health sciences (e.g., focused interviews, stakeholder questionnaires) to label sensing data for AI training and deployment, which in turn provides insights that fuel subsequent qualitative data gathering. Orthogonally, all deployment and technical development will be supported by an analysis of the legal implications of the approach. This positive feedback loop goes beyond established approaches in each discipline, bringing them together in a novel way. On the engineering side, we will research whether mixed methods data gathering from the social and health aspects of our research can be used to label complex data for AI training effectively.

Projected Rewards
Our proposed research has the potential to reduce the social isolation of senior hosts and a number of homeless youth by providing safe and alternative ways to address affordability and create social connections for both populations. Technology that can predict and trigger a response to unsafe behavior without relying on audio/visual recordings could be used in a variety of settings. There have been growing complaints that prisons, mental health centers, youth homes, and elderly living facilities lack sufficient staff to safeguard residents and employees. Often there are just too few staff to monitor all patients, offenders, or residents to identify risky behavior or respond quickly to emergencies. While this technology is not intended to serve as a replacement for adequate staffing, if would augment available staff by facilitating timely responses to address developing situations before they become unsafe, while still providing residents with a measure of privacy.
A large and diverse community will be impacted by our research. With our unique connections and ongoing partnerships with various organizations, our specific focus will be on marginalized seniors (e.g., indigenous, ethnic-cultural minorities, LGBTQ+) and their current experience using and perceiving technology, particularly at home. There is a dearth of knowledge in this area, so our study will support other researchers, policymakers, and organizations working to ensure that technology and homesharing developments are inclusive of diverse seniors' perspectives. On the technology side, we will develop new AI training methodologies, based on labeling sensor data through participants' questionnaire/interview responses. These can be translated to other domains, using other forms of data labeling. With the proliferation of AI limited only by the availability of high quality labeled data for training, our results will impact the wider AI community. Whilst in-home sensing technology has advanced substantially in the field of health care (e.g., fall detection, dementia support), its use for more ambiguous data gathering (e.g., mood, behavior) is less understood. We will investigate the suitability of sensing technology for these forms of data, and likely develop new sensor fusion methods that will be of use to a wide range of sensing applications. Technology that can predict and trigger a response to unsafe behavior without relying on audio/visual recordings could be used in a variety of settings. There have been growing complaints that prisons, mental health centers, youth homes, and elderly living facilities lack sufficient staff to safeguard residents and employees. Often there are just too few staff to monitor all patients, offenders, or residents to identify risky behavior or respond quickly to emergencies. While this technology is not intended to serve as a replacement for adequate staffing, if would augment available staff by facilitating timely responses to address developing situations before they become unsafe, while still providing residents with a measure of privacy.
By leveraging AI within a holistic societal, health, and law framework, our proposed research has the potential to result in a high reward alternative model for housing that includes social inclusion, well-being, technology to enhance safety, and legal considerations. We believe the outcomes of this research project will change the direction in the disciplines of social sciences, by re-phrasing them to include technology and health, and will unlock a new area of discovery for holistic social AI.

Conclusions
In this manuscript, we reviewed the state of the art on technology-based positive social disruption, focusing on challenges, opportunities, and recent trends. We described social barriers to the adoption of such models of social computing and described methods to overcome them, highlighting technological limitations that must be addressed.
Our main conclusion, based on our critical literature review, is that technology solutions do not suffice in isolation, and must instead be embedded within a social-conscious methodology, with input from relevant stakeholders, informed by practitioners outside of engineering, applying qualitative methods. However, this does not yet seem to be the norm. We have shown some limited applications of such a methodology in our use-case (e.g., community consultations, focus groups, interviews), and will continue to explore ways to increase such social-consciousness towards ubiquitous, ethically designed deployment of social computing.