Transdisciplinary AI Observatory -- Retrospective Analyses and Future-Oriented Contradistinctions

In the last years, AI safety gained international recognition in the light of heterogeneous safety-critical and ethical issues that risk overshadowing the broad beneficial impacts of AI. In this context, the implementation of AI observatory endeavors represents one key research direction. This paper motivates the need for an inherently transdisciplinary AI observatory approach integrating diverse retrospective and counterfactual views. We delineate aims and limitations while providing hands-on-advice utilizing concrete practical examples. Distinguishing between unintentionally and intentionally triggered AI risks with diverse socio-psycho-technological impacts, we exemplify a retrospective descriptive analysis followed by a retrospective counterfactual risk analysis. Building on these AI observatory tools, we present near-term transdisciplinary guidelines for AI safety. As further contribution, we discuss differentiated and tailored long-term directions through the lens of two disparate modern AI safety paradigms. For simplicity, we refer to these two different paradigms with the terms artificial stupidity (AS) and eternal creativity (EC) respectively. While both AS and EC acknowledge the need for a hybrid cognitive-affective approach to AI safety and overlap with regard to many short-term considerations, they differ fundamentally in the nature of multiple envisaged long-term solution patterns. By compiling relevant underlying contradistinctions, we aim to provide future-oriented incentives for constructive dialectics in practical and theoretical AI safety research.


Motivation
Lately, the importance of addressing AI safety, AI ethics and AI governance issues has been acknowledged at an international level across diverse AI research subfields [1][2][3][4][5][6]. From the heterogeneous and steadily growing set of proposed solutions and guidelines to tackle these challenges, one can extract an important recent motif, namely the concept of an AI observatory for regulatory and feedback purposes. Notable early practical realizations with diverse focuses include Italian [7], Czech [8], German [9] and OECD-level [10] AI observatory endeavors. Thereby, the Italian AI observatory project targets the public reception of AI technology and the Czech one tackles legal, ethical and regulatory aspects within a participatory and collective framework. The German AI observatory jointly covers technological foresight, administration-related issues, sociotechnical elements and social debates at a supranational and international level. Finally, the OECD AI Policy Observatory "aims to help policymakers implement the AI Principles" [10] that have been pre-determined by the OECD and pertain among others to data use and analytical tools. Theoretical and practical recommendations to integrate the retrospective documentation of internationally occurring AI failures have been presented by Yampolskiy [11] and very arXiv:2012.02592v2 [cs.CY] 7 Dec 2020 recently McGregor [12]. In addition, Aliman [13] proposed to complement such reactive AI observatory documentation efforts with transdisciplinary and taxonomy-based tools as well as proactive security activities. In this paper, we build on the approaches of both Yampolskiy and Aliman and elaborate on the necessity of a transdisciplinary AI observatory integrating both reactive and proactive retrospective analyses. As reactive analysis, we propose a taxonomy-based retrospective descriptive analysis (RDA) which analytically documents factually already instantiated AI risks. As proactive analysis, we propose a taxonomy-based so-called retrospective counterfactual risk analysis [14] (RCRA) that inspects plausible peak downward counterfactuals [15] of those instantiated AI risks to craft future policies. Downward counterfactuals pertain to worse risk instantiations that could have plausibly happened in that specific context but did not. While an RDA can represent a suitable tool for a qualitative overview of the current AI safety landscape revealing multiple issues to be addressed in the immediate near-term, an RCRA can supplement an RDA by adding breadth, depth and context-sensitivity to these insights with the potential to improve the efficiency of future-oriented regulatory strategies.
The remainder of the paper is organized as follows. In the next Section 2, we first introduce a simple fit-for-purpose AI risk taxonomy as basis for classification within RDAs and RCRAs for AI observatory projects. In Section 3 and in the subsequent Section 4, we elaborate on aims but also limitations of RDA and RCRA while collating concrete examples from practice to clarify the proposed descriptive and counterfactual analyses. In Section 5, we exemplify the requirement for transdisciplinarily conceived hybrid cognitive-affective AI observatory approaches and more generally AI safety frameworks. In Subsection 5.1, we provide near-term guidelines directly linked to the practical factuals and counterfactuals from RDA and RCRA respectively. Hereinafter, we discuss differentiated and bifurcated long-term directions through the lens of two recent AI safety paradigms: artificial stupidity (AS) and eternal creativity (EC)succinct concepts which are introduced in Subsection 5.2. We provide incentives for future constructive dialectics by delineating central distinctive themes in AS and EC which (while overlapping with regard to multiple near-term views) exhibit pertinent differences with respect to long-term AI safety strategies. Thereafter, in Section 6, we briefly comment on data collection methods for RDAs and idea generation processes for RCRAs. Finally, in Section 7, we summarize the introduced ensemble of transdisciplinary and socio-psycho-technological recommendations combining retrospective analyses and future-oriented contradistinctions.

Simple AI Risk Taxonomy
For simplicity and means of illustration, we utilize the streamlined AI risk taxonomy displayed in Figure 1 for the classification of practical examples of AI risk instantiations in the RDA and corresponding downward counterfactuals in the RCRA. This simplified taxonomy has been derived from a recent work by Aliman et al. [16]. (Note that the original taxonomy makes a substrate-independent difference between two disjunct sets of systems: Type I systems and Type II systems. While the set of Type II systems includes all systems that exhibit the ability to consciously create and understand explanatory knowledge, Type I systems are by definition all those systems that do not exhibit this capability. Obviously, all present-day AI systems are of Type I whereas Type II AI is up to now non-existent. In fact, the only currently known sort of Type II systems are human entities. For this reason, the taxonomy we consider here for RDA and RCRA only focuses on the practically-relevant and already instantiated classes of Type I AI risks.) Following cybersecurity-oriented approaches to AI safety [11,13,17,18], we do not only classically zoom in on unintentional failure modes but also on intentional malice exhibited by malevolent actors. This distinction is reflected in the utilized taxonomy by contrasting AI risks brought about by malicious human actors (risk Ia and Ib) vs. those caused by unintentional failures and events (risks Ic and Id). Moreover, the taxonomy distinguishes between AI risks forming themselves at the pre-deplyoment stage (Ia and Ic) vs. those forming themselves at the post-deployment stage (Ib and Id).

Aims and Limitations
To allow for a human-centered AI governance, one requires a dynamic responsive framework that is updatable by design [19] in the light of novel emerging socio-technological [20][21][22] AI impacts. For this purpose, it has been postulated to combine proactive and reactive mechanisms in AI governance frameworks in order to achieve an effective socio-technological feedback-loop [19]. An RDA can be understood as a reactive AI governance and AI safety mechanism. More precisely, taxonomy-based RDA documentation efforts could facilitate a detailed especially qualitative overview and valuable opportunity for fine-grained monitoring of the AI safety landscape. It could be harnessed to guide regulatory efforts, inform policymakers and raise sensitivity in AI security, law and the general public. Further, an RDA could inform future ethical and security-aware AI design and guide endeavors to build defense mechanisms for AI systems enhancing their robustness and performance.
In addition to the proposed fourfold qualitative distinction via the classification in risks Ia, Ib, Ic and Id, one could also introduce a quantitative parameter for intensity ratings [23] such as harm intensity [13]. Given the harm-based nature of human cognitive templates in morality [24,25], a harm parameter could provide a meaningful shortcut to encode the urgency of addressing specific risk instantiations in practice. However, given the simultaneous perceiver-dependency [25,26] of harm perception in morality which is strongly based on dyadic considerations (the degree to which an intentional agent is perceived to inflict damage to a vulnerable patient [25]), corresponding assignments may not generalize. Nevertheless, identifying peaks of harm intensity above a certain agreed upon threshold (e.g. starting at the level of lethal risks) from an RDA might represent a responsible strategy with less controversial assignments. (Analogously, as further specified in Section 4, it is meaningful to focus on analytically derived above threshold downward counterfactuals as basis for an RCRA.) Extracted RDA peaks can be useful to calibrate regulations where necessary while avoiding superfluous constraints for multiple stakeholders that could hinder freedom and progress in the AI field.
Obviously, the quality of RDA results depends on data collection methods and an RDA may not reveal a comprehensive overall picture. Generally, AI risk instantiations could stay unreported, overlooked by the manual or automated data sampling or even remain unnoticed in certain contexts despite already existing. Finally, it is important to note that an RDA should not be understood as means to predict the future. As known from Popper, a society cannot predict the contents of its own future knowledge [27]. This fundamental unpredictability is directly relevant to understand limitations of an AI observatory -it can only reveal patterns of the past. There is no guarantee of repetitions and for instance completely unknownable novel threats could emerge via future human malevolent creativity in the form of risk instantiations Ia and Ib or via yet unknown errors leading to future instances Ic and Id. Instead of conceiving of an RDA as an oracle, we suggest framing it as a valuable preparative but incomplete tool with certain fundamental and further non-fundamental limitations. How an RCRA can be utilized to tackle one restriction of the latter sort is described in Section 4.

RDA for AI Risk Instantiations Ia and Ib -Examples
To clarify the implementation of a taxonomy-based RDA for an AI observatory, we briefly analytically document a variety of concrete already instantiated AI risks starting with those linked to intentional malice (AI risks Ia and Ib). For risk Ia, the current goals of the human entities in the context of many induced events are mostly either adversarial goals hold by malicious actors or research goals of white hats and AI security researchers. To provide a simple and compact overview for risk Ia, we group the space of these different goals in a set of 6 (unquestionably non-exhaustive) main clusters: 5 adversarial clusters and 1 research cluster conflating the research goals. The aim of the research cluster is to demonstrate the feasibility of malicious AI design motivated by diverse adversarial goals across a variety of domains in order to foster safety-awareness. Beyond that, we consider 1 extra emerging risk pattern, namely automated disconcertion [28] which we introduce in a few paragraphs.
First, an adversarial cluster 1 could be described as grouping the use of generative AI for subsequent (cyber-)crime facilitation e.g. via impersonation [29][30][31][32]. Striking examples for adversarial cluster 1 include a deep-learning based voice cloning of the CEO of a UK-based company that enabled a fraudster to acquire ca. $243,000 [32] and a scammer that suceeded to cause a transfer of ca. $287,000 with a deepfake video sample impersonation [30]. Second, one can indentify an adversarial cluster 2 related to defamation, harassment, revenge and sextortion [33] typically employing deepfake techniques such as deep learning based facial replacement to visually place targeted often female individuals in pornographic video settings they never partook [34]. Third, adversarial cluster 3 comprises the use of AI for misinformation and disinformation purposes [35] including via fake profiles camouflaged with AI-generated synthetic portraits [36]. Fourth, an adversarial cluster 4 consists in using deepfake methods (as well as recent applications of deepfakes to virtual reality [37]) for a form of non-consensual voyeurism whereby even underage victims are assumed to be affected in some cases [38]. Fifth, adversarial cluster 5 includes AI-supported espionage [39] (e.g. via AI-generated fake profile pictures on social media platforms [40]), AI-aided intelligence gathering [41] and controversial AI-supported targeted profiling [42].
Moreover, we identify a research cluster 1 as described. Notably, security researchers provided proof-of-concepts among others related to designing camouflaged undetectable fake samples usable for other crimes (e.g. adversarial deepfakes bypassing deepfake filters [43] which could be misused to conceal unethical illegal material disguised as deepfakes and furthermore undetected AI-generated fake comments i.a. on a federal public comment website [44]). Recent security work also successfully explored advanced deepfake techniques for improved impersonation, spear-phishing and large-scale disinformation [45]. Yampolskiy crafted a proof-of-concept for an AI-generated fake academic article [46] perhaps simultaneously acting as cautionary example and as a form of honeypot [47] for inattentive readers that might cite this article unknowingly. Other researchers identified an emerging interest for deepfake ransomware [48] in certain cybercriminal circles. Beyond that, it has been demonstrated that via a replica of a victim intelligent system (a deep reinforcement learning agent), the policies of the victim system can be compromised in a targeted way [49].
Interestingly, an already perceptible consequence of the mere existence of risk Ia instantiations containing the design of deepfake technologies already led to the emergence of a risk pattern which has been termed automated disconcertion [28]. Automated disconcertion can imply the intentional or also unintentional mislabelling of real samples as fake -e.g. in the context of misleading conspiracy theories [50] or against the background of uncertain political settings as it was the case in Gabon not long ago [51]. (To summarize the latter, a "recent failed military coup in the context of pre-existing political unrest in Gabon was partially grounded in the proliferation of the wrong assumption that an official presidential video represented a manipulative deepfake video" [28].) Conversely, automated disconcertion can also mean that fake samples are considered as being authentic or simply lead to highly uncertain and inconclusive settings in which doubts cannot be further resolved in reasonable time with acceptable resources. In short, this additional outlier risk pattern is called automated disconcertion since it does not further necessitate the interference of any actors to be repeatedly instantiated after initiation.
Coming to risk Ib, its instantiations are currently predominantly concentrated in a single research-oriented cluster (in analogy to research cluster 1 for risk Ia instantiations). However, it is thinkable that exploits of AI vulnerabilities unknown to the public are already taking place before disclosure (a type of zero-day exploits [52] applied to the AI domain). The main benign research goal for security researchers to target risk Ib instantiations is currently mostly to disclose existing AI vulnerabilities against malicious attacks and explore possible novel defenses against those before their exploitation. This already led to an incessant attacker-defender race in the fast moving field of security for machine learning and adversarial examples [53][54][55][56]. In recent years, researchers have among others developed different attack schemes on how to evade cybersecurity AI [57], e-mail protection, verification tools [58], forensic classifiers [59] and person detectors [60], how to elicit algorithmic biases [13,61], how to fool medical AI [62][63][64][65], law enforcement tools [66] as well as autonomous vehicles [67,68], how to perform denial-of-service and other adversarial attacks on commercial AI services [69][70][71], how to cause energy-intense and unnecessarily prolonged processing time [72] and how to poison AI systems post-deployment [73].

RDA for AI Risk Instantiations Ic and Id -Examples
In this subsection, we continue to elucidate the practical application of a taxonomy-based RDA by now briefly analytically documenting various already instantiated unintentionally triggered risks that formed themselves at the pre-and post-deployment stage (i.e. risk Ic and Id respectively). For risk Ic, we group the space of observed failure modes in a set of 5 (unquestionably non-exhaustive) main failure clusters. In addition, we present 1 extra emerging risk pattern. In analogy to the outlier risk pattern of automated disconcertion related to risk Ia instantiations, we introduce the risk pattern of automated peer pressure representing an already perceptible side-effect of specific risk instantiations Ic. In the case of AI risk instantiations Id, we consider a single main failure cluster. (Overall, in some cases, it is difficult to delineate a risk instantiation type unambiguously (e.g. Ic vs. Id in the presence of multiple complex influences or even in a few cases Ic vs. Ia given different ethical perspectives). This practical limitation is partially linked to the perceiver-dependency of classification-related assignments that may also play a role in a future AI observatory. However, by publicly sharing the sources, it is possible for entities external to an AI observatory to refine interpretations. Generally, we humbly subscribe to the epistemological view that all knowledge is fallible [74].) For risk Ic, we consider the 5 main failure clusters described in the following. First, failure cluster 1 comprises ethically-relevant instances of algorithmic bias [75]. Part of this cluster are misclassifications of diverse underrepresented patterns in AI training datasets with unethical repercussions as exhibited in e.g. facial misidentification [76], facial recognition failures [77,78], inaccuracy in AI-aided diagnosis [79]. Other cases are datasets with historically outdated unethical labels [80] and ethically-sensitive training biases favoring overrepresented patterns [81]. Second, failure cluster 2 refers to instances of poorly designed low-performing AI that are halted subsequently [82]. Third, failure cluster 3 are AI methods designed for law enforcement but threatening privacy [83]. Fourth, failure cluster 4 subsumes all unintentional risk instantiations linked to more or less hidden pseudo-scientific or outdated and previously refuted preconceptions. For instance, the deployment of AI for facial recognition of criminals based on "minute features" [84,85] in their face is based on pseudo-scientific assumptions [86]. Further, the deployment of present-day image-based "emotion recognition" AI is not grounded in state-of-the-art [87] affective science and lacks the required multimodal and context-sensitive modelling to be able to mimick how humans infer [88] (and not detect) affective patterns. In fact, a ban has been requested for premature emotion AI i.a. to prevent usage in ethically sensitive settings [89] such as law enforcement, fraud detection or recruiting. Fifth, failure cluster 5 is linked to affective, persuasive [90] and (micro-)targeted AI-aided methods that already permeate human cognitive-affective constructions in a way extending beyond the initial design purposes and causing epistemic biases ranging from a loss of critical stance via AI-empowered social media [91,92] to flawed mind perception in present-day robots [93,94].
A further risk pattern that emerged via the mere existence of specific AI risk instantiations Ic assignable to the failure cluster 5, is a construct that we call automated peer pressure. It is already known that attention at a collective level can be intentionally biased and manipulated in social media [91] also with the help of bots [95,96] (risk Ia). Moreover, as stated in an open letter written by multiple known psychologists and sent to the American Psychological Association: "[...] the desire for social acceptance and the fear of social rejection are exploited by psychologists and other behavior change experts to pull users into social media sites and keep them there for long periods of time" [97] -especially children [90]. Susceptible collective attention mechanisms and beliefs are already even unintentionally [92] strongly influenced by AI-empowered social media initially developed for benign purposes. Paired with the strong social dependency of humans where social pressure plays an important regulatory role with biological roots [98], it already triggered what one could call automated peer pressure, a self-perpetuating pattern of social pressure [99][100][101][102][103] without the need for social agents that directly and consciously exert it. Beyond that, the known group phenomenon of "self-reinforcing networks of like-minded users" [96] encountered in social media has been termed homophily [91,96]. Overall, a combination of a multiplicity of heterogeneous factors of which epistemic biases, homophily, affective contagion [91,104], bots and automated peer pressure are only a subset may foster the documented spread of propaganda in social media [96] as well as the reported negative impacts on the mental health of young users [92,105].
Finally, concerning AI risk Id, we observe one main failure cluster which is connected to unanticipated post-deployment usage modes and contexts which also includes eventual complications within unusual interactions of the AI system in a dynamically changing environment. Notable examples are failures of facial recognition AI linked to COVID-19 causing the widespread use of facial masks [106][107][108], the invariant responses of natural language processing systems when faced with nonsensical instead of usual meaningful queries [109] (disclosing the low level of understanding) and the AI-based censorship of a picture displaying ancient slavery settings due to a forerunning misclassification labelling the sample as displaying nudity [110]. Other cases include unknown latent biases in medical AI [111] and other forms of biases in medical AI that unfold post-deployment as a function of geographical factors [112].

Aims and Limitations
While upward counterfactuals of a factual event refer to the better ways in which that event could have unfolded but did not, downward counterfactuals refer to those conceivable ways in which this event could have turned out worse. In the past, counterfactual thinking has often been framed as detrimental rumination or even as cognitive bias. However, a modern explanatory framework from social psychology termed functional theory of counterfactual thinking (abbreviated with FTCT in the following) stresses that counterfactual thoughts can offer "[...] insights that comprise blueprints for future action [...]" [15]. FTCT stresses that counterfactual thinking serves problem-solving and can exhibit high usefulness especially in complex multi-causal domains [15]. At the intrapersonal level, counterfactual thoughts are based on implicit processes caused by problems, they are linked to a negatively valenced state of core affect [113] and have the potential to evoke (mental or physical) actions that can potentially correct the underlying errors. This procedure instantiates a regulatory loop -which corresponds to a type of negative feedback model [113] enacted as goal-oriented corrective behavior.
Recently, the notion of an RCRA [14] building upon downward counterfactuals from historical events has been proposed to risk stakeholders in the context of risk management applied to hazardous events (such as earthquakes or terroristic attacks). As explained by Woo [14], such an innovative augmented historical analysis represents a generic universal tool that can supplement regulatory resilience tests and sense-making while facilitating the formation of more differentiated and nuanced views. Given its conjectured domain-general nature and seeming applicability to complex multi-causal domains of risk analysis, we suggest to transfer RCRA to AI observatory contexts at a conceptual level.
For illustration purposes, the Subsection 4.3 presents a simplified RDA-based RCRA which directly builds upon the exemplary RDA performed in the last Section 3. Our method is loosely inspired by Woo's RCRA conception which manifests itself by the general integration of downward counterfactuals from historical samples. However, our step-wise methodology (elucidated in the subsequent Subsection 4.2) to extract meaningful candidates for the simulation 1 of downward counterfactuals given a large state space of past events has been independently conceived and tailored to the specific AI observatory domain. Overall, we understand an RCRA as complement for a forerunning RDA. Together, this pair of retrospective analyses could provide a solid starting point for future AI observatory projects to be however necessarily updated and error-corrected with time.
In abstract terms, combining RDA and RCRA can be seen as a socio-technological enactment of the regulatory loop-governing behavior [113] described in FTCT -which fits to the AI governance recommendation mentioned earlier in Subsection 3.1, namely the notion of a socio-technological feedback-loop combining proactive and reactive measures [13,19,20]. While an RDA mainly represents a reactive documenting approach, an RCRA attempts to broaden future proactive measures by anticipating potential extreme branches of the future while resisting the fallacy to cast itself as oracle tool. We emphasize that in the light of the fundamental unpredictability of future knowledge creation as well as the fallibility and incompleteness of human knowledge, surprises and errors are unavoidable. No RCRA can guarantee unassailability. This is similarly the case in cybersecurity for other types of techniques that are likewise assignable to a broad class of proactive security measures related to downward counterfactuals such as penetration testing [114] and red teaming [115,116]. Also there it holds that the non-detection of a vulnerability does not guarantee its absence. (Conversely, the detection of a vulnerability does also not guarantee its future exploitation by malicious actors 2 .)

Preparatory Procedure
After having expounded on aims and limitations of an RDA-based RCRA, we speak to the preparatory procedure of meaningfully extracting the required downward counterfactuals for an RCRA taking as input the set O RDA containing all instances from the forerunning RDA. However, before providing further details, we recall as mentioned in Subsection 3.1 that a meaningful agreed upon threshold τ of harm intensity is 1 Downward counterfactuals can be (co-)created e.g. in a predominantly mental form, facilitated by immersive design fiction settings [28] (including storytelling narratives and virtual reality) or simulated and visualized with technological tools. 2 For instance, their interest could shift, the asset could be(come) less interesting or the attack too time-consuming and costly.

Figure 2.
Simplified sketch on possible preparatory procedure to extract peak generic downward counterfactuals for an RCRA out of a forerunning taxonomy-based RDA for an AI observatory. The top node stands for the initial set O RDA containing all RDA samples. For illustration, the risk instantiation clusters from Section 3.2 and 3.3 are filled in. A refers to adversarial, R to research, E to extra and F to failure cluster. The conjunction of all analytically derived leaves are possible generic above threshold downward counterfactuals of interest for the RCRA. In this example, the output set for the RCRA corresponds to recommendable. Although perceiver-dependent, a sufficiently high threshold such as e.g. when set to plausible downward counterfactuals of at least lethal dimension may be suitable. On an oversimplified harm intensity scale with 1 standing for almost no harm and 5 for existential risk, let 4 stand for a lethal risk (with 2 encoding minor and 3 major harm). Naturally, this threshold and scale are solely employed for purely illustrative purposes and more differentiated and tailored approaches may be required in practice [13]. Equipped with the scale and the exemplary threshold τ = 4, we elaborate in the following on how the set O RCRA of all clusters 3 considered in an RCRA can be constructed starting with O RDA and consecutively applying the following ordered sequence of 4 operations in yet to be described ways: 1) taxonomization, 2) analytical clustering, 3) brute-force deliberation and threshold-based pruning, 4) assembly.
As first step, taxonomization is applied to O RDA which consists in a one-to-one mapping of each AI risk instantiation sample to either a key from the taxonomy (i.e. Ia, Ib, Ic or Id) or in theory to a generic placeholder key for novel unknown patterns. In our description, all samples were directly or at least secondarily assignable to the pre-existing taxonomy keys and no unknown key was required. As second step, the researchers apply an analytical clustering operation based on a self-generated explanatory semantic grouping linking every sample associated with a specific key, to a cluster. By way of example, under risk Ia discussed in Section 3.2, this operation led to 5 adversarial clusters, 1 research cluster and 1 extra cluster. In a third step, the researchers apply brute-force deliberation 4 and threshold-based pruning by mentally going through every single sample of O RDA and trying to devise -within reasonable self-determined time limits -a plausible downward counterfactual where it holds for the self-rated harm intensity h, that h ≥ τ. If such a suitable downward counterfactual is generated in time, the sample is maintained, otherwise the sample is discarded from further consideration. Finally, the fourth operation assembly is performed which requires to assemble O RCRA by linking back the remaining samples to their clusters from the second step. On this 3 For an enhanced context-sensitivity and to avoid overfitting to the idiosyncrasies of single isolated events, we recommend RCRA simulations at the level of clusters and not of single instances as becomes apparent in the next Subsection 4.3. 4 In theory, this search can be optimized further. However, the aim is to (at a later stage) obtain a broad as possible set of counterfactual instances to increase illustrative power. Both one-to-one and many-to-one mappings between downward counterfactual instances and clusters can potentially become RCRA-relevant if stored. This is connected to the complementary cognitive co-creation method used to interlink the preparatory procedure with the RCRA that we explain in Subsection 6.2. basis, one obtains the generic downward counterfactuals that need to be analyzed for the intended RCRA. In short, this simple step-wise procedure takes RDA instances as inputs and produces a set of generic RCRA clusters as output. This output set O RCRA represents the superset of the searched meaningful generic above threshold downward counterfactuals of interest. For clarification, the next paragraph briefly comments on the application of this simple preparatory procedure to our exemplary RDA instances.
While applying the third step of brute-force deliberation and threshold-based pruning, we deleted a large amount of RDA samples since many instances did not seem to have had a plausibe downward counterfactual with a harm intensity h ≥ τ. However, we decisively already identified certain rare samples where this condition was fulfilled. In the fourth step, we assembled O RCRA by linking these maintained samples back to 8 RDA clusters as described in the following. For risk Ia, we deleted the first and fifth adversarial cluster but maintained adversarial cluster 2 (encoded with A a 2 ), adversarial cluster 3 (A a 3 ), adversarial cluster 4 (A a 4 ), the single research cluster (R a 1 ) and the extra cluster (E a 1 ) of automated disconcertion. For risk Ib, the standalone research cluster (R b 1 ) was maintained. For risk Ic, only the extra cluster (E c 1 ) of automated peer pressure remained, and we deleted all failure clusters. Finally, for risk Id, we kept the single available failure cluster (F d 1 ). While these clusters were mapped to factual risk instantiations, an RCRA obviously requires the generation of corresponding downward counterfactuals. Thus, instead of A a 2 , we encode its unreal generic downward counterfactual which we denote A a 2 . Similarly, instead of A a 3 , we write A a 3 and so forth. Consequently, as illustrated in a highly simplified form in Figure 2, one can hereafter fairly straightforwardly assemble these fragments (visualized as the leaves of the tree) in order to obtain the final output set

Exemplary RDA-based RCRA for AI Observatory Projects
Recently, co-creation design fictions (DFs) [117,118] known from human-computer interaction (HCI) [119] have been recommended for security practices in the AI field [120] and at the intersection between AI and virtual reality (AIVR) [28]. Generally, DFs "can be used for technological future projections by experts in the form of e.g. narratives or construed prototypes that can be represented in text, audio or video formats but also in VR environments" [28] (whereby the authors cautioned likewise not to regard a DF as a means to predict the future but as preparatory tool). In our view, one promising way to perform an RDA-based RCRA could be to frame each RCRA cluster as co-creation DF task. Distinctively, instead of projecting into the future as performed in classical DF contexts, such an RDA-based RCRA-DF construes instances of RCRA cluster narratives (or experiential prototypes) projecting to the counterfactual past. For illustration, we apply a simplified RDA-based RCRA-DF to each of the 8 elements within the set in the preparatory procedure of the previous Subsection 4.2. For RCRA-DFs pertaining to intentional malice (risks Ia and Ib), we provide short DF narratives taking the form of a succinct threat model [28,121] specifying adversarial goals, knowledge and capabilities. By contrast, for unintentional failure modes (risks Ic and Id), we instead describe a short failure model comprising initial design goals, knowledge gaps and unintended effects. Generally, we only consider instantiations of RCRA clusters that correspond to above threshold downward counterfactuals (i.e. with a harm intensity h ≥ τ whereby in Subsection 4.2, τ was exemplarily set to lethal risk dimensions).

Downward Counterfactual DF Narrative A a 2
• Adversarial Goals: AI-aided defamation, revenge, harassment and sextortion. • Adversarial Knowledge: Since it is a malicious stakeholder that is designing the AI, the system is available to this adversary in a transparent white-box setting. Concerning the knowledge pertaining to the human target, a grey-box setting is assumed. Open-source intelligence gathering and social engineering are exemplary tools that the adversary can employ to widen its knowledge of beliefs, preferences and personal traits exhibited by the victim. • Adversarial Capabilities: In the following, we briefly speak to exemplary plausible counterfactuals of at least lethal nature that malicious actors could have been capable to bring about and that are "worse than what actually happened" [14] (as per RDA). For defamation purposes, it would have been for instance possible to craft AI-generated fake samples that wrongly incriminate victims with not actually executed actions (e.g. a fake homicide but also fake police violence) leading to a subsequent assassination when deployed in precarious milieus with high criminality. To enact revenge with lethal consequences in socio-cultural settings that particularly penalize the violation of restrictive moral principles, similar AI-based methods could have been applicable (e.g. via deepfakes assumingly displaying fake adultery or contents linked to homosexuality). An already instantiated form of AI-enabled harassment mentioned in the RDA consists in sharing fake AI-generated video samples of pornographic nature via social media channels [34]. Consequences could include suicide of vulnerable targets (as generally in cybervictimization [122]) or exposure to a lynch mob. In fact, the contemplation of suicide by deepfake pornography targets has already been reported lately [33]. Finally, concerning AI-supported sextortion, warnings directed to teenagers and pertaining to the convergence of deepfakes and sextortion have been formulated recently [123]. Given the link between sextortion and suicide associated with motifs such as i.a. hopelessness, humiliation and shame [124], consequences of technically feasible but not yet instantiated deepfake sextortion scams could also include suicide -next to simplifying this criminal enactment by adding automatable elements.

Downward Counterfactual DF Narrative A a 3
• Adversarial Goals: AI-aided misinformation and disinformation.
• Adversarial Knowledge: Identical to adversarial knowledge indicated in 4.3.1.
• Adversarial Capabilities: Technically speaking, a malicious actor could have crafted misleading and disconcerting fake AI-generated material that could be interpreted by extreme endorsers of pre-existing misguided conspiracy theories as providing evidence for their beliefs inciting them to subsequent lethal violence. A historical precedent of gun violence as reaction to fake news seemingly confirming false conspiracy theories was the Pizzagate shooting case where a young man fired a rifle in a pizzeria "[...] wrongly believing he was saving children trapped in a sex-slave ring" [125]. Beyond that, when it comes to (micro-)targeted [91] disinformation, conceivable malicious actors could have more systematically already employed hazardous AI-aided information warfare [96] techniques in social media. This could have been supported by AI-enabled psychographic targeting tools [91] and via networks of automated bots [96,126] partially concealed via AI-generated artefacts such as fake profile pictures. While the level of sophistication of many present-day social bots is limited [127], more sophisticated bots emulating a breadth of human online behavior patterns are already developed [128,129] and it is known for some time [130] that "[...] political bots exacerbate political polarization" [131]. By AI-aided microtargeting of specific groups of people that are ready to carry out violent acts, malicious actors could have caused more political unrest with major lethal outcomes. In fact, Tim Kendall who was a prior director of monetization at Facebook recently stated more broadly that "[...] one possible near-term effect of online platforms' manipulative and polarizing nature could be civil war" [92].
• Adversarial Knowledge: Identical to adversarial knowledge indicated in 4.3.1.
• Adversarial Capabilities: Before delving into downward counterfactuals that corresponding malicious actors could have already brought about, it is important to note that the goal considered in this cluster is not primarily the credibility or appearance of authenticity exhibited by the synthetic AI-generated contents. Rather, the focus when visually displaying the target non-consensually in compromising settings is more on feeding personal fantasies or facilitating a demonstration of power [37,38] while the synthetic samples can obviously concurrently be shared via social media channels. Against this backdrop, it is not difficult to imagine that when editing visual material of vulnerable targets with practices such as deep-learning based "undressing" [38], a disclosure could induce motifs of hopelessness, humiliation and shame in some of those individuals provoking suicidal attempts similar to the hypothetical deepfake sextortion counterfactual described in 4.3.1.
The mere sensing of having been victimized via non-consensual deepfake pornography has also been associated with the perception of a "digital rape" [33,132]. Especially when the victims are underage [38], this could plausibly reinforce suicidal ideation. Another dangerous avenue may be subtle combination possibilities available to the malicious actor. Non-consensual voyeuristic (but also more generally abusive) illegal but quasi-untraceable material bypassing content filters could be meticulously concealed with deepfake technologies and unnoticedly propagated 5  For instance by mixing real material with synthetic elements obtained from style-based generative adversarial network methods [133], deep-learning based face-replacement and adversarial deepfake techniques [43] in order to evade content filters critical to law enforcement. 6 With intelligent systems, we refer to technically feasible AIs able to independently perform the OODA-loop (i.e. observe, orient, decide, act), but simultaneously totally subordinated to and goal-governed by human entities (e.g. using updatable human-defined ethical goal functions [19] prepared pre-deployment -where humans fill in ethically-relevant parameters into a suitable blank but context-sensitive scientifically-grounded template denoted augmented utilitarianism [13]).
adversarial attacks is already known [62][63][64][65]139] and could be exploited by actors intending medical fraud e.g. for financial gains, certain exertions of this practice in the wrong settings could be misused as tool for murder attempts and targeted homicides.
• Adversarial Goals: This extra cluster of automated disconcertion refers to a risk pattern that emerged automatically from the mere availability and proliferation of deepfake methods in recent years. However, it is conceivable that this AI-related agentless automatic pattern can be intentionally instrumentalized in the service of other (not necessarily AI-related) primary adversarial goals. One example for a primary adversarial goal cluster in the light of which it is appealing for a malicious actor to strategically harness automated disconcertion, would be information warfare and agitation on social media. In fact, early cases may already occur [50]. • Adversarial Knowledge: Identical to adversarial knowledge indicated in 4.3.1.
• Adversarial Capabilities: The use of social media in information warfare has been described to be linked to the objective to intentionally blur the lines between fact and fiction [91]. The motif of automated disconcertion itself could be weaponized and misleadingly framed as providing evidence for post-truth narratives offering an ideal breeding ground for global political adversaries performing information warfare via disinformation. Malicious actors could then intensify this framing with the use of pertinent AI technology enlarging their adversarial capabilities as described earlier under the cluster of AI-aided misinformation and disinformation in 4.3.2. Given that automated disconcertion may aggravate pre-existing global strategically maintained confusions [140], it becomes clear that a more effective incitement to lethal violence, political unrest with major lethal outcomes or civil wars could be achieved. • Designer Goals: Although automated peer pressure refers to an agentless self-perpetuating mechanism that emerged through AI-empowered (micro-)targeting 7 on social media, its origins can certainly be traced back to the original benign or neutral economic intentions underlying the early design of social media platforms. Psychologist Richard Freed called present-day social media an "attention economy" [90] and it is plausible that social media profits from the maximization of utilization time spent by their users. • Knowledge Gaps: Early social media designers may not have foreseen the far-reaching consequences of the designed socio-technological artefacts including threats of lethal dimension or even existential caliber according to some present-day viewpoints [92]. • Unintended Failures: The more attention users pay to social media contents, the more time they may spend with like-minded individuals (consistent with homophily [91,96]) and the more they may be 7 As for instance successfully performed in the Cambridge Analytica case [91].
prone to automated peer pressure. The latter can an also be partially fueled by social bots aggravating polarization [131]. The bigger the success of information warfare and targeted disinformation on social media and the higher the performance of the AI technology empowering it, the more groups of like-minded peers could (but of course not necessarily) uptake misleading ideas. Individuals could then -via these repercussions -sense a social pressure to suppress their critical thinking and get accustomed to simply copy in-group narratives irrespective of their contents. This scenario could in turn play into the hands of malicious actors of the type mentioned in 4.3.5 and raise the amount and intensity of the lethal and catastrophic scenarios of the sort described in 4.3.2.
4.3.8. Downward Counterfactual DF Narrative F d 1 • Designer Goals: Implementation of high-performance AI.
• Knowledge Gaps: Designers cannot predict the emergence of yet unknown global risks for which no scientific explanatory framework exists (otherwise that would contradict the fundamental unpredictability of future knowledge creation mentioned in Subsection 4.1). Given that the past does not contain data patterns of yet never instantiated hazards, the datasets utilized to train "high-performance" AI cannot already have these eventualities reflected in their metrics. • Unintended Failures: Exemplary failures that resulted from this unavoidable type of knowledge gap, are multiple post-COVID AI performance issues [139,141,142]. Simultaneously, humanity relies more and more on medical AI systems. Would humans have been confronted with a more aggressive type of yet unknown biological hazard requiring even faster reactivity, it is conceivable that under the wrong constellations, the AI systems optimizing on metrics pertaining to the then deprecated old or on the novel but yet too scarce and thus biased datasets [142] could have led to unreliable policies up to the potential of a major risk.

Hybrid Cognitive-Affective AI Observatory -Transdisciplinary Integration and Guidelines
In this Subsection 5.1, we compile near-term AI safety guidelines with respect to: 1) the factual RDA clusters introduced in Section 3 and 2) the RDA-based RCRA clusters from Subsection 4.3. For 2), we only specify the necessary supplementary and non-overlapping guidelines to avoid repetitions.

RDA:
• A a 1 : Clearly, for risk Ia instances of adversarial cluster 1 related to the misuse of generative AI to facilitate cybercrimes (e.g. via impersonation within social engineering phone calls), already known security measures regarding identity check are needed as minimum requirement. A standard approach to mitigate dangers of malevolent impersonation [143] is to go beyond something you are (biometric) [144], and to also require something you know (password) [145] and/or something you have (ID card). Generally, an awareness-raising training of users and employees on social engineering methods including the novel combination possibilities emerging from malicious generative AI design seems indispensable. In addition, it may be helpful to systematically complement those measures with old-fashioned but potentially effective pre-approved but updatable private arrangements made offline which can also employ offline elements for identity check. For instance, the malicious actor may not be able to react appropriately in real-time if presented with a from his perspective semantically unintelligible inspection question making use of offline pre-agreed upon (dynamically updated) linguistically ciphered insider idioms. The induced confusion could consequently help to dismantle the AI-aided impersonation attempt. Having said this, it is important to analyze the attack surface that the availability of voice cloning and even video impersonation with generative AI brings about when instrumentalized for attacks against widespread voice-based or video-based authentication methods. • A a 2 : This cluster pertaining to AI-aided defamation, harassment, revenge and sextortion exhibits the need for far-reaching legislatures for the protection of potential victims. Legal frameworks but also social media platforms may need to counteract large-scale propagation of material that threatens the safety of targeted entities. Social services could initiate emergency call hotlines for dangerous deepfake victimization. Moreover, the creation of local temporary shelters or havens combining a team of transdisciplinary experts and volunteers for acute phases immediately succeeding the release of compromising material on social media channels appears recommendable. However, the initiation of a societal-level debate and education could foster destigmatization of deepfake instrumentalized for defamation, harassment and revenge. It could dampen the effects of widely distributed compromising material once the general public looses interest in such currently salient elements. More broadly, educating the public about the capabilities of deep-fake technology could be helpful in mitigating defamation, harassment and sextortion since just like society learned to deal with fake Photoshop images, society can also learn scepticism towards AI-generated content. • A a 3 : AI-aided misinformation and disinformation represents a highly complex socio-psycho-technological threat landscape that needs to be addressed at multiple levels using multi-layered [146] approaches. For instance, in a recent work addressing the malicious applications of generative AI and corresponding defenses, Boneh et al. [128] provide a list of directly or indirectly concerned actors: "authors of fake content; authors of applications used to create fake content; owners of platforms that host fake content software; educators who train engineers in sensitive technologies; manufacturers and authors who create platforms and applications for capturing content (e.g., cameras); owners of data repositories used to train generators; unwitting persons depicted in fake content such as images or deepfakes; platforms that host and/or distribute fake content; audiences who encounter fake content; journalists who report on fake content; and so on". Crucially, as further specified by the authors, "a precise threat model capturing the goal and capabilities of actors relevant to the system being analyzed is the first step towards principled defenses" [128]. In fact, as briefly adumbrated in Subsection 4.3, the format of the RDA-based RCRA-DFs we proposed for risk Ia and Ib was purposefully instantiating exactly thata threat model. Overall, we thus recommend grounding the development of near-term AI safety defenses (as applied to AI-aided disinformation but also more generally) in RDA-based RCRA-DFs that can be once generated potentially retroactively diversified by novel DF narrative instances tailored to the exemplary actors mentioned by Boneh and collaborators. This could broaden the RCRA results and allow for an enhanced targeted development of countermeasures. • A a 4 : For this AI-aided form of non-consensual voyeurism, the measures of an emergency hotline and a specialized haven as mentioned under cluster A a 2 are likewise applicable. Legislators need to be informed on psychological consequences especially for underage victims. While cluster A a 2 implied the overt public dissemination of compromising material by what minor individuals would be less at risk given the potential repercussions, the purely voyeuristic case can often be covert and attracts motivational profiles that can target minor individuals [38]. In addition, it might be valuable to proactively inform the general public and also adult population groups susceptible to this issue in order to lift the underlying taboos and to mitigate negative psychological impacts. In the long run, instantiations of this cluster are unlikely to be prevented any more than one can prevent someone fantasizing about someone else. Hence, in the age of fake generative AI artefacts with the virtualization of fake acts of heterogeneous nature normally violating physical integrity in the real-world, it might become fundamentally important to re-assess and/or update societal notions intimately linked to virtual, physical and hybrid body perception in a critical and open dialogue. • A a 5 : With regard to AI-aided espionage, companies and public organizations in sensitive domains need to broadly create awareness especially related to the risk of fake accounts with fake but real appearing profile pictures. For instance, since the generator in a generative adversarial network (GAN) [147] is by design imitating features from a given distribution, advanced results of a successful procedure could appear ordinary and more typical -potentially facilitating a psychologically-relevant intrinsic camouflaging effect. In effect, according to a recent study focused on the human perception of GAN pictures displaying faces of fake individuals that do not exist, "[...] GAN faces were more likely to be perceived as real than Real faces" 8 [148]. Beyond that, the authors described an increased social conformity towards faces perceived as real independently of their actual realness. This is concerning also in the light of the extra cluster E c 1 of automated peer pressure that could make AI-aided espionage easier. A generic trivial but often underestimated guideline that may also apply to AI-aided open-source intelligence gathering would be to reduce the sharing of valuable information assets via social media channels and more generally on publically available sources to a minimum. Finally, to confuse person-tracking algorithms and prevent surveillance, camouflage [149] and adversarial patches [60] embedded in clothes and accessoires can be utilized. • R a 1 : As deep-fake technology proliferates and is used in numerous criminal domains, it is conceivable that an arms-race between malevolent fakers and AI forensic experts [150,151] will ensue, with no permanent winner. Given that this cluster R a 1 covers a wide variety of research domains in which security researchers and white hats attempt to preemptively emulate malicious AI design activities to foster safety awareness, a consequential recommendation appears to actively support such research at multiple scales of governance. Talent in this adversarial field would need to be attracted by tailored incentives and should not be limited to a standard sampling from average sought-after skill profiles in companies, universities and public organizations of high social reputation. This may also help to avoid an undesirable drift to adversaries for instance at the level of information operations risking reinforcing capacities mentioned in the downward counterfactual DF narratives on cluster A a 3 , R a 1 and E a 1 presented in Subsection 4.3. Hence, a monolithic approach in AI governance with a narrow focus on ethics and unintentional ethical failures is insufficient [13]. Finally, we briefly address guidelines related to a specific R a 1 issue concerning science (as asset of invaluable importance for a democratic society [152]) that did not yet gain attention in AI safety and AI governance but that makes further inspections appear imperative in the near-term. Namely, targeted studies on AI-aided deception in science to produce AI-generated text disseminated as fake research articles (see the research prototype developed by Yampolskiy [46] in another research context) and possibly AI-generated audiovisual or other material meant to display fake experiments or also fake historical samples (see the recent MIT deepfake demonstration [153] developed for educative purposes). However, this technical research direction requires a supplementation by transdisciplinary experts addressing the socio-psycho-technological impacts and particularly the epistemic impacts of corresponding future risk instantiations. We suggest that for a safety-relevant sense-making, AI governance may even need to stimulate debates and exchanges on the very epistemological grounding of science -before e.g. future texts written by maliciously designed sophisticated AI bots (also called sophisbots [128]) infiltrate the scientific enterprise with submissions that go undetected. For instance, there is a fundamental discrepancy 9 between how Bayesian and empiricist epistemology would analyze this risk vs. how Popperian critical rationalist epistemology would view the same risk. Disentangling this epistemic issue is of high importance for AI safety and beyond as becomes apparent in the guidelines linked to the next cluster E a 1 below. • E a 1 : Near-term guidelines to directly tackle this extra cluster associated to automated disconcertion seem daunting to formulate. However, as a first small step, one could focus on how to avoid exacerbating it. One reason why this cluster may seem difficult to address is due to its deep and far-reaching epistemic implications pertaining to the nature of falsification, verification, fakery and (hyper-)reality [157] itself. With regard to this feature of epistemic relevance, E a 1 exhibits a commonality with the just introduced different risk of AI-aided deception in science. We postulate that in the light of pre-existing fragile circumstances in the scientific enterprise including the emergence of modern "fake science" [158] patterns but also the mentioned fundamental discrepancies across epistemically-relevant scientific stances, AI-aided deception in science could have direct repercussions on automated disconcertion. First, it could for instance unnecessarily aggravate automated disconcertion phenomena in the general public as e.g. the belief in epistemic threats [154] could increase people's subjective uncertainty. Second, a reinforced automated disconcertion can subsequently be weaponized and instrumentalized by malicious actors with lethal consequences as generally depicted under the downward counterfactual DF narrative E a 1 described in Subsection 4.3. This explains our near-term AI governance recommendation to address AI-aided deception in science as transdisciplinary collaborative endeavor analyzing socio-psycho-technological and epistemic impacts. • R b 1 : For this cluster linked to risk Ib and pertaining to research on AI vulnerabilities currently performed by security researchers and white hats, we recommend (as analogously already explained in R a 1 ) to recruit such researchers preemptively. In this vein, Aliman [13] proposes to "organize a digital security playground where "AI white hats" engage in adversarial attacks against AI architectures and share their findings in an open-source manner". For the specific domain of intelligent systems, it is advisable to proactively equip these AIs with technical self-assessment and self-management capabilities 10 [20] allowing for better real-time adaptability for the eventuality of attack scenarios known from past incidents or proof-of-concept use cases studied by security researchers and white hats. However, it is important to keep in mind that challenges from this cluster also deal with zero-day AI exploits, they are the unknown unknowns and cannot be meaningfully anticipated and prevented, though it is realized that many issues could be caused by under-specification in machine learning systems [159]. 9 Bayesian and empiricist epistemological stances placing the empirical collection of evidence and the identification of true beliefs at the center of science may link AI-aided deception to "epistemic threats" [154] -knowledge-relevant impairments of belief-updating which they already see emerging via deepfakes (i.a. subsuming a general decrease of information in audiovisual samples [154]). By contrast, Popperian epistemic views [155] and especially their Deutschian extension [156] predominantly emphasize in the first place the explanatory and criticism-centered purpose of science next to the (experimental) falsifiability of hypotheses. Strikingly, Deutsch describes science as the endless quest for invariant, hard-to-vary explanations of reality [156]. On this view, AI-aided deception in science may be practically problematic, but without question solvable. In fact, while the empiricist direction faces epistemic threats and a post-truth difficulty, the Popperian and Deutschian direction may neither see explanatory knowledge, truth, falsifiability nor the scientific method per se at risk. 10 The conjunction of technical self-assessment and self-management has been summarized under the synonymous umbrella terms of Type I AI self-awareness [13], self-awareness functionality [20] or simply self-awareness.

RCRA (additional non-overlapping guidelines):
• A a 2 : Generally, one possible way to systematically reflect upon defense methods for specific RCRA instances (generated from downward counterfactual clusters) of harm intensity h down ≥ τ, could be to perform corresponding upward counterfactual deliberations targeting a harm intensity h up < τ.
As briefly introduced in Subsection 4.1, upward counterfactuals refer to those ways in which a certain event could have turned out better but did not. Recently, Oughton et al. [160] applied a combination of downward and upward counterfactual stochastic risk analyis to a cyber-physical attack on electricity infrastructure. In short, the difference to the method that we propose is that instead of focusing on slightly better upward counterfactuals given the factual event as made sense in the case of Oughton et al., we suggest a threshold-based selection of below threshold upward counterfactuals given above threshold downward counterfactuals 11 . For instance, as applied to the present downward counterfactual cluster A a 2 which also included a narrative instance describing suicide attempts with lethal outcomes as a consequence of AI-aided defamation, harassment and revenge, it could simply consist in deliberations on how to avoid these lethal scenarios. This could be implemented by deliberating from the perspective of planning a human, hybrid or fully automated AI-based emergency team response with a highly restricted timeframe (e.g. to counteract the domino-effect initiated by the deployment of the deepfake sample on social media). Next to a proactive combination of deepfake detectors and content detectors for blocking purposes that can fail, a reactive automated social network graph analysis AI combined with sentiment analysis tools could be trained to detect large harassment and defamation patterns that if paired with the sharing of audiovisual samples, can prompt a human operator. This individual could then decide to call in social services that in turn proactively contact the target offering support as analogously mentioned under the guidelines for the factual RDA sample A a 2 . • A a 3 : For this downward counterfactual cluster on AI-aided misinformation and disinformation of at least lethal dimensions, we focus on recommendations pertaining to journalism-relevant defenses and bots on social media. Disinformation from fake sources could be counteracted with the use of blockchain-based reputation systems [161] to assess the quality of information sources. Journalists could also entertain a collective blockchain-based repository containing all news-relevant audiovisual deepfake samples whose authenticity has been refuted so far. This tool could be utilized as publically available high-level filter to evade certain techniques of disinformation campaigns. Moreover, the case of hazardous large-scale disinformation supported by sophisticated automated social bots is of high relevance for what one can term social media AI safety. Ideally, tests for a "bot shield" enabling some bot-free social media spaces could be crafted. However, it is conceivable that at a certain point, AI-based bot detection [162] might become futile. Also, social bots already fool people [131,163] and many assume that humans will become unable to discern them in the long-term. Nevertheless, it could be worthwhile viewing what one could have done better already with present technological tools (the upward counterfactuals) -which can also include the consideration of divergent unconventional solutions or novel formulations of questions. As stated by Barrett, "[...] progress in science is often not answering old questions but asking better ones" [164]. Perhaps, in the future, humans could still devise bot shielding tactics that could attempt to bypass epistemic issues [165] intrinsic to imitation game and Turing Test [166] derivatives where "real" and "fake" become relative.
• A a 4 : To tackle suicidal ideation as a consequence of AI-aided non-consensual voyeurism that enters the awareness of the targeted individual, one may need to extend the countermeasures already mentioned in the factual RDA counterpart A a 4 of this cluster (which also included the creation of public awareness and the removal of associated taboos). Social services and public institutions like universities and schools could offer emergency psychological interventions for the person at risk. Next to necessitated measures at the level of legal frameworks to protect underage victims, the subtle case of adult targets calls for instance for a civil reporting office collaborating with social media platforms which could initiate a critical dialogue with the other party to bring about an immediate deletion or at least categorical refraining from further dissemination of the material which can be calibrated to the expectations of the target. Recently, the malicious design of deepfakes has been described as a "[...] serious threat to psychological security" [167]. Adult targets may despite the synthetic nature of the deepfake samples and often eventually their private character restricted to a personal possession of the agent in question, perceive their mere existence as degradation [168] -a phenomenon certainly requiring social discourses in the long-term. For a principled analytical approach, an extensive psychological research program integrating a collaboration with i.a. AI security researchers could be helpful in order to be able to contextualize relevant socio-psycho-technological aspects against the background of advanced technical feasibility. Importantly, instead of limiting this research to deepfake artefacts in the AI field, one needs to also cover novel hybrid combination possibilities available for the design of non-consensual voyeuristic material. Notably, this includes blended applications at the intersection of AI and virtual reality [28,37] (or augmented reality [169]). • R a 1 : Concerning proactive measures against future research where an adversary designs self-owned intelligent systems to trigger lethal accidents on victim intelligent systems, one might require legal norms setting minimum requirements on the techniques employed for the cybernetic control of systems deployed in public space. From an adversarial AI perspective, this could include the obligation to integrate regular updates on AI-related security patches developed in collaboration with AI security researchers and white hats that also study advanced physical adversarial attacks. This becomes particularly important as many stakeholders are currently unprepared in this regard [170]. As guideline, we propose that future adversarial AI research endeavors explore attack scenarios where adversarial examples on physically deployed intelligent systems are delivered by another physically deployed intelligent system which potentially offers more degrees of freedom to the malicious actor. From a systems engineering perspective, any intelligent system might need to at least integrate multiple types of sensors and check for inconsistencies at the symbolic level. Next to explainability requirements, a further valuable feature to create accountability in the case of accidents could be a type of self-auditing via self-assessment and self-management [20] allowing for a retrospective counterfactual analysis on what went wrong. • E a 1 : As its factual counterpart E a 1 , this counterfactual cluster E a 1 refering to automated disconcertion instrumentalized for AI-aided information warfare and agitation on social media with the risk to incite lethal violence at large scales, represents a weighty challenge of international extent. As for E a 1 , multi-level piecemeal tactics of constructive small steps such as e.g. targeted methods to avoid exacerbating it may be valuable. Concerning AI governance, that could include the strategies mentioned under E a 1 but also more general efforts in line with international frameworks that aim to foster strong institutions and error-correction via life-long learning (see e.g. [19] for an in-depth discussion). • R b 1 : For this counterfactual cluster pertaining to malicious research on vulnerabilities of deployed AI systems with the goal to trigger extensive fatal road accidents, we recommend tailored measures analogous to those presented for the counterfactual cluster R a 1 .

Near-term Guidelines for Risks Ic and Id
As can already be realized from the scope of the AI safety guidelines proposed in Subsection 5.1.1 which are grounded in our AI observatory exemplification of RDA and RCRA, modern AI technology cannot be analyzed in isolation. In our view, due to the complex multi-causal socio-psycho-technological interwovenness underlying AI risks and their instantiations, AI safety requires an inherently transdisciplinary, hybrid and cognitive-affective approach [13]. Transdisciplinarity is especially required to avoid cognitive blind spots within AI safety risk analyses and formulations of countermeasures or guidelines. AI safety needs a hybrid perspective to incorporate the intricacies of human-computer interactions necessitating a consideration of human nature next to purely technological viewpoints. Finally, a cognitive-affective perspective is called for due to the inseparably affective nature of human cognition [171,172] whose disregard in AI development can consequently engender significant safety issues by virtue of a lack of requisite variety [173]. While the last Subsection 5.1.1 focused on guidelines concerning the AI risks Ia and Ib related to intentional malice, this Subsection 5.1.2 is linked to the risks Ic and Id related to mistakes and unintentional failures which are often of ethically-relevant nature. This specific avenue of research represents a well-studied field at the core of modern AI ethics which recognizes multidisciplinarity, human-centeredness and socio-technical contextualization as important requirements [174]. In the last years, a large multiplicity of heterogeneous AI ethics guidelines have been proposed at an international level [175][176][177][178]. We refer the reader to Jobin et al. [179] for a global overview of internationally proposed AI ethics guidelines which are directly of relevance for the 5 failure clusters (F c 1 to F c 5 ) linked to risk Ic from the RDA presented in Subsection 3.3. In the following, we focus on the few remaining RDA and RCRA clusters which are not classically in the primary focus of AI ethics.

RDA:
• E c 1 : This cluster related to automated peer pressure can be i.a. met by measures raising public awareness on the dangers of the confirmation bias [180,181] reinforced via AI-empowered social media. However, a possible upward counterfactual on that issue would be to revert negative consequences of automated peer pressure by utilizing it for beneficial purposes. For instance, it is cogitable that automated peer pressure need not represent a threat would it simply perhaps paradoxically socially reinforce critical thinking instead of reinforcing tendencies to blindly copy in-group narratives. Ideally, such a peer pressure would reinforce heterophily (the antonym of homophily) with regard to various preferences with one notable exception being the critical thinking mode itself. Hence, one interesting future-oriented solution for AI governance may be education and life-long learning [19] conveying critical thinking and criticism as invaluable tools for youth and general public. For instance, critical thinking skills fostered in the Finnish public education system were effective against disinformation operations [91]. In fact, critical thinking, criticism and transformative contrariness may not only represent a strong shield to tackle disinformation or automated disconcertion and its risk potentials (cluster E a 1 and E a 1 respectively), but it also represents a crucial momentum for human creativity [13,182]. Generally, peer pressure is in itself a psychological tool that could be systematically used for good, for example by creating an artificial crowd [183] of peers with all members interested in desirable behaviors such as education, start-ups or effective altruism. A benevolent crowd of peers can then counteract hazardous bubbles on social media. • F d 1 : Concerning AI failures rooted in unanticipated and yet unknown post-deployment scenarios, it becomes clear that accuracy and other AI performance measures cannot be understood as conclusive and engraved in stone. A possible proactive measure against post-deployment instantiations of yet unknown AI risks could be the establishment of a generic corrective mechanism. Problems which AI systems experience during its deployment due to differences between training and usage environments can be reduced via increased testing and continued updating and learning stages. On the whole, multiphase deployment, similar to vaccine approval phases, can reduce an overall negative impact on society and increase reliability. Finally, for each safety-critical domain in which AI predictions are involved in the decision-making procedure, one could -irrespective of present-day AI performance -foresee the proactive planification of a human response team in case of sudden expanding anomalies that a sensitized and safety-aware human operator could detect.

RCRA (additional non-overlapping guidelines):
• E c 1 : A twofold guideline for this counterfactual cluster (refering to automated peer pressure with lethal consequences via automated disconcertion instrumentalized for AI-aided disinformation), could be to weaken the influence of social bots by measures described under cluster A a 3 and by transforming automated peer pressure into strong incentives for critical thinking as stated in E c 1 . • F d 1 : Finally, for this cluster of major risk dimension being the counterfactual counterpart of cluster F d 1 , we emphasize the importance of an early proactive response team formation in contexts such as for instance medical AI, AI in the financial market, AI-aided cybersecurity and critical intelligent cyberphysical assets. In short, AI systems should by no means be understood to be able to truly operate independently in a given task even if current excellent performance measures seem to suggest so. In the face of unknown and unknowable changes, performance is a moving target which if mistaken as conclusive and static could endanger human lifes.

Long-Term Directions and Future-Oriented Contradistinctions
After having introduced a broad variety of near-term guidelines for future AI observatory endeavors based on the exemplified systematic factual and counterfactual retrospective analyses, we provide a differentiated more general outlook on explicitly long-term AI safety directions. For this purpose, we select two recent theoretical AI safety paradigms: on the one hand a direction that has been termed artificial stupidity (AS) (see [184][185][186]) and on the other hand, a direction that we succinctly call eternal creativity (EC) stemming from recent work [13,16,187]. Thereby, note that these two paradigms are by no means postulated to represent the full panoply of nuances and views across the entirety of the young AI safety field. Rather, we select these specific two examples because critical contradistinctions ascertainable via a comparative analysis point to a set of decisive bifurcations which might be of particular interest for the AI safety community due to their potentially axiomatic relevance for the future of AI research. While AS and EC coincide in multiple short-term considerations given their common hybrid cognitive-affective nature and their emphasis on cybersecurity-oriented practices, they fundamentally differ with regard to 3 future-relevant contradistinctions.
We consider the following 3 contradistinctive leitmotifs: 1) regulatory distinction criterium, 2) regulatory enactment and 3) substrate management. First, while AS primarily considers intelligence levels for 1), EC ponders the ability to consciously create and understand explanatory knowledge. Second, whilst AS foresees deliberate restrictions of AI capabilities as tool for 2), EC especially tackles their systematic enhancement. Third, while AS views substrate-dependent hardware analyses (next to software considerations) for bounded equalization between humans and AIs as approach to 3), EC aims at unbounded substrate-independent functional augmentation. While there exist certainly more possible lines along which one could compare AS and EC, we focus on the mentioned 3 themes due to their urgency and potential to foster constructive dialectics in future theoretical and applied AI (safety) research beyond AI observatory contexts. In Subsection 5.2.1 and 5.2.2, we briefly provide a general introduction followed by a summarization of long-term AI safety guidelines formulated from the perspective of AS and EC respectively as seen through the lens of these 3 contradistinctions.

Paradigm Artificial Stupidity (AS)
One core assumption in the AS paradigm is that an artificial general intelligence (AGI) "[...] can be made safer by limiting its computing power and memory, or by introducing Artificial Stupidity on certain tasks" [185]. Thereby, an AI system is understood to be made artificially stupid on a certain task if its capabilities are deliberately limited by human designers for the purpose of matching the human performance on that task. One mentioned exemplary domain where such a technique is already applied is in text-to-speech synthesis such as e.g in Google Duplex, an AI for natural conversations over the phone whose implementation included "[...] the incorporation of speech disfluencies (e.g. "hmm"s and "uh"s)" [188]. Another example is the context of video games where AI can in principle vastly exceed human performance which is however purposefully restricted in order to allow for a positive human-centered gaming experience. More generally, there are many AI application domains where it is human-desirable to mimic anthropic performance or behavioral patterns for an improved customer service. These cases correspond to a type of imitation game which only succeeds if the AI does not reveal latent super-human capabilities. From that point of view, the AS paradigm conceives of making an AI artificially stupid as being necessary to making it pass a Turing test [185,186].
Simultaneously, in the last years, AI achieved superhuman-level performance across more and more tasks. Further, it is assumed in AS that "[...] AI tends to quickly achieve super-human level of performance after having achieved human-level performance" [186]. Against this background, AS argues distinguishingly that "[...] by limiting an AI's ability to achieve a task, to better match humans' ability, an AI can be made safer, in the sense that its capabilities will not exceed humans' capabilities by several orders of magnitude" [186]. In short, AS postulates that AI ability needs to be upper-bounded by human performance since it risks to otherwise become uncontrollable 12 once it turns into what Bostrom termed a superintelligence -an intellect exceeding human cognitive performance across "[...] virtually all domains of interest" [191]. Such a hypothetical future artificial superintelligence is believed to not necessarily be value-aligned with humans (while potentially becoming unintelligible to humans due to the gaps in performance), to be capable of insidious betrayal (a scenario termed treacherous turn [191]) and to potentially represent a major risk [192] to humanity.
• Regulatory distinction criterium: In this light, one can extract intelligence (or more broadly "performance" or "cogntive performance" across tasks) as the recurring theme of relevance for regulatory AI safety considerations under the AS paradigm. At a first level, one could identify two main safety-relevant clusters: a cluster comprising all AIs that are less or equally capable than an average human [186] and another cluster of superintelligent AI systems. The latter can be further subdivided into three classes of systems as introduced by Bostrom [191]: 1) speed superintelligence, 2) collective superintelligence and 3) quality superintelligence. According to Bostrom, the first ones "can do all that a human intellect can do, but much faster", the second ones are "composed of a large number of smaller intellects such that the system's overall performance across many very general domains vastly outstrips that of any current cognitive system" and the third ones are "at least as fast as a human mind and vastly qualitatively smarter" [191]. • Regulatory enactment: In a nutshell, AS recommends limiting an AI in hardware and software such that it does not attain any of these enumerated sorts of superintelligence since "[...] humans could lose control over the AI" [185]. AS foresees regulatory strategies on "how to constrain an AGI to be less capable than an average person, or equally capable, while still exhibiting general intelligence" [186]. • Substrate management: To limit AI abilities while maintaining functionality, AS proposes multiple practical measures at the hardware and software level. Concerning the former it proposes diverse restrictions especially pertaining to memory, processing, clock speed and computing [186]. With regard to software, it foresees necessary limits on self-improvement as well as measures to avoid treacherous turn scenarios [185]. Another guideline consists in deliberately incorporating known human cognitive biases in the AI system. More precisely, AS postulates that human biases "can limit the AGI's intelligence and make the AGI fundamentally safer by avoiding behaviors that might harm humans" [185]. Overall, the substrate management in AS can be categorized as substrate-dependent because the artificial substrate is among others specifically tuned to match hardware properties of the human substrate for at most equalization purposes. In summary, AS suscribes to the viewpoint that AI safety aims to "limit aspects of memory, processing, and speed in ways that align with human capabilities and/or prioritize human welfare, cooperative behavior, and service to humans" [184] given that AGI "[...] presents a risk to humanity" [184].

Paradigm Eternal Creativity (EC)
According to Deutsch, "the only uniquely significant thing about humans [...] is our ability to create new explanations [...]" [156]. He further specifies that explanatory knowledge "gives people a power to transform nature which is ultimately not limited by parochial factors, as all other adaptations are, but only by universal laws" [156]. Instead of emphasizing levels of intelligence or of performance across a wide set of tasks when analyzing AI safety issues, EC focuses epistemologically on one unique "task": the ability to consciously create and understand explanatory knowledge. Thereby, in EC, explanatory knowledge creation also implies the capability to consciously understand. Given that core affect is understood as a fundamental property of consciousness [164,171] and is linked to cognitive-affective counterfactual deliberations [16], this excludes philosophical zombie themes [193]. (In modern embodied and enactive cognition frameworks [164,194], consciousness is linked to processes of inference for the cybernetic control of a substrate in an environment connected to allostasis [172] (anticipation of needs before they occur [164]) -integrating predictions and error signals from external and internal milieu. It is on such cybernetic control grounds that affective dynamics give rise to the egocentric virtual first-person perspective of the world [195,196] familiar to humans and lacking in present-day AI.) Note that EC's focus on consciously creating and understanding explanatory knowledge is by no means an anthropomorphic assumption forced on AI systems. As elucidated in constructor theory [197,198], a novel explanatory framework in physics, explanatory knowledge creators (of which currently only humans are known) are brought to the fore in physics in an entirely non-anthropocentric way. To put it very simply, constructor theory focuses on possible vs. impossible counterfactuals i.e. what could happen given physical laws and why (instead of predictions based on laws of motion and initial conditions). On contemplating the set of all physical transformations that would be possible in the universe i.e. those that could happen, one would notice that the size of the very subset containing those transformations that actually happen can be strongly influenced by entities able to create and understand knowledge on how to bring them about [156]. This is how explanatory knowledge creation enters "the cosmic scheme of things" [156] and this is also why EC prioritizes the conscious understanding and creation of explanatory knowledge via creativity 13 instead of intelligence.
At first sight, given the fundamental unpredictability of future explanatory knowledge, it might seem dangerous for AI safety. Deutsch mentions that "no good explanation can predict the outcome, or the probability of an outcome, of a phenomenon whose course is going to be significantly affected by the creation of new knowledge" [156] and further that this fundamental limitation is something that "when planning for the future, it is vital to come to terms with it" [156]. EC agrees. EC recently formulated the AI safety paradox [13,16] stating that value alignment and control are conjugate requirements in AI safety. This means that both prevailing ideals cannot be simultaneously fulfilled. EC also states that "the price of security is eternal creativity" [13]. So despite the AI safety paradox, a cybersecurity-oriented and risk-centered AI safety is possible -when reframed "as a discipline which proactively addresses AI risks and reactively responds to occurring instantiations of AI risks" [13]. In short, AI safety is not condemned, it just needs to come to terms with the compulsion to keep correcting and creating solutions "ad infinitum".
• Regulatory distinction criterium: EC distinguishes two substrate-independent and disjunct sets of systems: Type I and Type II systems. Type II systems are all systems for which it is possible to consciously create and understand explanatory knowledge. Type I systems are all systems for which this is an impossible task 14 . Thereby, a subset of Type I systems can be conscious (such as non-human mammals) and requires protection akin to animal rights. Obviously, all present-day AI systems are of Type I and non-conscious. Type II AI is non-existent today. • Regulatory enactment: In theory, with a Type II AI, "a mutual value alignment might be achievable via a co-construction of novel values, however, at the cost of its predictability" [13]. As with all Type II systems (including humans), the future contents of the knowledge they will create are fundamentally unpredictable -irrespective of any intelligence class 15 . In EC, this signifies that: 1) Type II AI is uncontrollable and requires rights on a par with humans, 2) Type II AI could engage in a mutual bi-directional value alignment with humans -if it decides so and 3) it would be unethical to enslave Type II AI. (Finally, banning Type II AI is a potential loss of requisite variety and does not hinder malicious actors to do so.) By contrast, regarding Type I AI, EC implies that: 4) Type I AI is controllable, 5) Type I AI cannot be fully value-aligned across all domains of interest for humans due to an insufficient understanding of human morality, 6) conscious Type I AI is possible and would require animal-like rights but it is clearly non-existent nowadays. • Substrate management: To avoid functional biases [201] due to a lack of diversity in information processing, EC opts for a substrate-independent functional view. Irrespective of its specific substrate composition, an overall panoply of systems is viewed as one unit with diverse functions. Given Type-II-system-defined cognitive-affective goal settings, a systematic function integration can yield complementary synergies. Notably, EC recommends research on substrate-independent functional artificial creativity augmentation [187] (artificially augmenting human creativity and augmenting artificial creativity). For instance, active inference could technically increase Type I AI exploratory abilities [202,203]. Besides that, in Subsection 6.2, we apply a functional viewpoint to augment RCRA DF generation by human Type II systems for AI observatory purposes.
evolution -as mistakenly assumed by Popper [16,199]. This is epistemologically relevant because ideas are not created by blind trial and error (as variation and selection in biological evolution). Even if novel idea contents are fundamentally unpredictable a priori, idea variation is partially guided by previous experience, the task and contextual cues i.e. there is a non-zero coupling between variation and selection [199]. Creativity itself could have historical roots in serendipity and multi-purpose socially shared doubt [13] facilitating in theory error-correction but initially largely used to maintain traditions. 14 EC could be stated to apply a constructor-theoretic distinction to AI safety insofar as it applies a possibility-impossibility dichotomy [197] embedded in an explanatory framework to it. 15 Under EC, superintelligence is as explained not of distinctive interest. It is also viewed as not implying profound qualitative differences to human baselines. Following Deutsch, it would be "[...] subject only to limitations of speed or memory capacity, both of which can be equalized by technology" [200]. EC views human augmentation as valid transformative defense strategy [187].

RDA Data Collection
For the collection of RDA samples utilized for illustration purposes in this paper, we undertook a simple keyword-based web search limited to articles in the period between 2018 and 2020. The main queries (with associated boolean operators) that we considered were: "artificial intelligence","AI", "autonomous", "neural network", "deepfakes", "AI" AND "bias", "AI" AND "failure", "AI" AND "security", "AI" AND "safety", "AI" AND "attack". While many terms are tailored to the type of keys represented in the taxonomy (Ia, Ib, Ic, Id) that served as basis for categorization in the RDA as introduced in Section 2, we also considered utmost general queries such as "artificial intelligence" in order to do justice to the eventuality that we might identify a novel entirely unexpected categorization pattern. With other words, we also foresaw the possibility of not yet encountered anomalies while analyzing the results. As briefly mentioned in Subsection 4.2, such a case would have been assigned to a generic placeholder key for novel unknown patterns. It would have called for further scrutiny and eventually for a future enlargement of the taxonomy. However, as mentioned in Subsection 4.2, we did not yet identify any novelty of this kind in the discussed RDA. Though, at a lower level, we discovered atypical instances of the pre-existing key-determined clusters. We tagged this atypicality by refering to corresponding clusters with the attribute "extra" -which was the case for the extra cluster of automated disconcertion linked to risk Ia and the extra cluster of automated peer pressure connected to risk Ic.
Self-evidently, the underlying search can be performed in a more sophisticated way in future AI observatory projects. First, a broader range of keys and combinations can be strategically devised in the light of RDA and RCRA results from a previous AI observatory iteration. Second, the efforts can be supported by web crawlers [204]. Third, this could be combined with sentiment analysis tools [205] to detect negatively polarized texts of interest for an RDA. Fourth, the creation of novel datasets for text classification [206] could be undertaken for the pre-existing keys of the taxonomy which might however remain insufficient with regard to placeholders for novel patterns. In this vein, we stress the importance of human analysts for a deep semantic understanding requiring explanatory knowledge especially when it comes to the discovery of subtle novel tendencies within superficially similar text sources. Morever, an intense examination of textual material can lead to a further disentanglement of pre-existing clusters -which could even reveal the need for a broader change of the taxonomic keys. In short, a safety-aware responsible RDA data collection pipeline is not entirely automatable and requires human-level understanding by analysts.

Interlinking RDA-based RCRA Pre-processing and RCRA DFs
As elucidated in Subsection 4.2, the preparatory procedure generating candidate RCRA clusters based on RDA instances consisted of 4 consecutive steps: 1) taxonomization, 2) analytical clustering, 3) brute-force deliberation and threshold-based pruning and finally 4) assembly. Subsequently, these RCRA clusters served as basis to generate RCRA DFs that we exemplified with short RCRA narratives instantiating these clusters as presented in Subsection 4.3. However, for the sake of simplicity, the exact methodological approach to interlink the preparatory procedure and the RCRA co-creation DF was not previously characterized. In a nutshell, we utilized a method we call complementary cognitive co-creation (CCC). While other methods are thinkable, we encourage considering CCC where possible for reasons described in the next paragraphs. Beforehand, we must specify that purposefully, the set of researchers involved in the preparatory procedure of the RCRA and the set of researchers performing the ensuing RCRA DFs were disjunct. For clarity, we refer to the former as preparatory group and to the latter as executive group. We explain how a complementary collaborative effort between these groups in the form of CCC can increase the variety and illustrative power of RCRA DFs.
After applying taxonomization and analytical clustering to the RDA instances, the preparatory group has been described in Subsection 4.2 to perform brute-force deliberation and threshold-based pruning. While a brute-force search could appear suboptimal at first sight, we specifically considered this option in order to allow for the preparatory group to potentially be able to retrospectively diversify the generation of instances performed by the executive group given the RCRA clusters. This becomes possible, because whilst the preparatory group goes through every single available RDA instance, it attempts to generate an above threshold downward counterfactual that if identified can later turn out to be utile to store. In short, when a downward counterfactual is successfully generated for a given RDA sample, the preparatory group can not only maintain the RDA sample, but also store the generated downward counterfactual instance for later RCRA augmentation purposes. Thereby, as briefly specified, generic RCRA clusters were used instead of specific instances as inputs for the RCRA DFs to avoid overfitting to the idosyncrasies of unique events and possibly generate a broader variety of DF scenarios. In fact, by solely providing RCRA clusters to the executive group at the start of the DFs, we avoid a potentially biased negative influence by the narrow instances of the preparatory group that fulfilled a different primary function (namely the identification of above threshold patterns). To recapitulate, the preparatory procedure can be more precisely re-explained as follows: the preparatory group undergoes all 4 consecutive steps with the crucial additional detail that the brute-force deliberation and threshold-based pruning operation also includes the storing of a successfully generated downward counterfactual instance for each maintained factual RDA instance. After this pre-DF processing, the preparatory group delivers the RCRA clusters to the executive group which then engages in generating a variety of narratives instances for each obtained cluster. Post-DF, the executive group compares the generated instances with those imagined by the preparatory group pre-DF. All cases that were not yet considered by the executive group 16 but were generated by the preparatory group, are concatenated to the now augmented DFs. Duplicates are ignored.
This overall sequence of steps presents a theoretical collaborative basis for an augmentation of co-creation DFs to which we refer to with CCC. A further tool that may improve the efficacy of CCC is to add a functional viewpoint (i.e. related to information processing in a certain context). On closer inspection, it becomes clear why CCC can profit from a functional or/and (neuro-)cognitive [207][208][209] diversity of the partaking researchers. Given that in the human cognitive domain, variety is the norm [210] and heterogeneity can provide requisite variety in complex multi-causal dynamic problem domains [201] necessitating collective learning [207] and innovation [211], it makes sense to explore this potential. For instance, while the preparatory group can especially profit from individuals that excel at convergent thinking, the executive group may benefit from divergent thinkers. Pre-DF, the preparatory group needs to map from one factual instance to one counterfactual instance. In the DF, the executive group maps from one counterfactual cluster to many counterfactual instances. The former requires a horizontal integration at a low level of abstraction while the latter requires a vertical integration from a higher to a lower level of abstraction revealing the potential for complementary synergies 17 . A CCC-based approach combining 16 Note that if given an RCRA cluster, the executive group would not succeed in imagining a corresponding instance for a narrative, there is always at least one back-up instantiation -which corresponds to the narrative envisaged by the participatory group pre-DF (whose identification represented the precondition for this cluster to exist in the first place). 17 For instance, despite possible significant context-dependent [210] hindrances, dyadic mismatches [212] and disabilities, autistic traits are also paired with enhanced convergent thinking [213], detail-rich thinking [214] and higher verbal creativity [215] while attention deficit hyperactivity disorder traits have been linked to enhanced divergent thinking [216,217] and enhanced originality and flexibility [218]. Systematically combining these two complementary cognitive profiles under a CCC-oriented approach to RDA-based RCRA-DFs for AI observatory feedback-loops could engender benefits. a preparatory group comprising i.a. individuals with a cognitive profile exhibiting strengths in the former and an executive group i.a. sampled from a pool of individuals with strengths in the latter could increase efficiency, variety and illustrative power of RDA-based RCRA co-creation DFs -critical to raise safety-awareness in experts but also in the public.

Conclusions
Starting with a cybersecurity-oriented fit-for-purpose taxonomy of ethical distinction, we introduced and exemplified a retrospective descriptive analysis (RDA) for future AI observatory projects. Subsequently, we elucidated how to craft a complementary retrospective counterfactual risk analysis (RCRA) based on downward counterfactuals from the previously extracted factual RDA samples. Motivated by recent work on risk management of hazardous events [14] and the functional theory of counterfactual thinking [15] from social psychology, we elaborated on why an RDA-based RCRA may be suitable for risk analyses in a complex multi-causal domain such as AI safety. Thereafter, in the light of the ethical sensitivity of AI risk instantiations, we discussed the use of harm intensity ratings for samples of an AI observatory given the perceiver-dependent, harm-based and dyadic nature of human cognitive templates in morality [25]. For illustrative purposes, we suggested a threshold-based approach focusing the RDA-based RCRA on downward counterfactuals of at least lethal dimensions. On the one hand, such a high threshold may engender fewer discrepancies in the moral perception being related to harm. On the other hand, it may simultaneously represent a suitable threshold reinforcing mortality salience (i.e. the awareness of one's mortality). From the perspective of a relevant socio-psychological theory denoted terror management theory [219,220], mortality salience -whose elicitation is also conceivable in co-creation design fictions from HCI including virtual reality settings [221] -may be able to foster safety-awareness and cautionary attitudes [221,222]. Against the backdrop of the RDA samples collected and our targeted RDA-based RCRA, we formulated the need for inherently transdisciplinary and hybrid cognitive-affective AI observatory and AI safety strategies. As guidelines for future work, we compiled a rich variety of tailored multi-level near-term solutions.
Finally, we provided a differentiated general outlook on long-term AI safety directions by axiomatically contrasting two disparate recent AI safety paradigms along relevant contradistinctive leitmotifs. More precisely, we contrasted the artifical stupidity (AS) paradigm with the eternal creativity (EC) paradigm. While AS and EC share a common cybersecurity-oriented and hybrid cognitive-affective stance with regard to multiple near-term AI safety solutions, they differ fundamentally in many future avenues of research. AS offers intelligence-focused, restriction-based and tailored substrate-dependent long-term guidelines. By contrast, long-term EC guidelines bring into focus conscious explanatory knowledge creation and understanding and recommend unbounded functional augmentation of substrate-independent nature. While AS suggests utilizing human cognitive performance as upper bound for AI capabilities to limit hardware and software parameters, EC takes a cybernetic perspective according to which humans need to jointly augment both human and AI functions -e.g. via a doubly ambiguous artificial creativity augmentation research.
In a nutshell, we collated retrospective analyses complemented by future-oriented contradistinctions in order to: 1) apprise future AI observatory projects using concrete examples from practice and technically plausible above threshold downward counterfactuals, 2) thematizing possibly decisive bifurcations in future AI (safety) research and 3) pointing out the requirement of a constructive collaborative dialectical approach addressing those. As stated by Popper, "while differing widely in the various little bits we know, in our infinite ignorance we are all equal" [155]. Time might tell whether the assumption that "the price of security is artificial stupidity" or rather that "the price of security is eternal creativity" [13] (or none of those) turns out to practically solve long-term AI safety problems. Either way, explanatory knowledge co-creation can heavily influence whether we will succeed in understanding how to transform today's vulnerability awareness and mortality salience into the currently known or unknowable upward counterfactuals of our counterfactual future.