A Holistic Framework for Forecasting Transformative 2 AI

: In this paper we describe a holistic AI forecasting framework which draws on a broad 8 body of literature from disciplines such as forecasting, technological forecasting, futures studies and 9 scenario planning. A review of this literature leads us to propose a new class of scenario planning 10 techniques that we call scenario mapping techniques. These techniques include scenario network 11 mapping, cognitive maps and fuzzy cognitive maps, as well as a new method we propose that we 12 refer to as judgmental distillation mapping. This proposed technique is based on scenario mapping 13 and judgmental forecasting techniques, and is intended to integrate a wide variety of forecasts into 14 a technological map with probabilistic timelines. Judgmental distillation mapping is the centerpiece 15 of the holistic forecasting framework in which it is used to inform a strategic planning process as 16 well as for informing future iterations of the forecasting process. Together, the framework and new 17 technique form a holistic rethinking of how we forecast AI. We also include a discussion of the 18 strengths and weaknesses of the framework, its implications for practice and its implications on 19 research priorities for AI forecasting researchers.


Introduction
In a world of quick and dramatic change forecasting future events is challenging.If this wasn't the case then meteorologists would be out of a job.However, meteorological forecasting is relatively straightforward today given the relatively low price of computation, the advanced capabilities of numerical simulation and the myriad powerful sensors distributed around the world for collecting input information.Forecasting technological progress and innovation, however, is much more difficult because there is no past data to draw upon and future technologies are at best poorly understood [1].Forecasting progress toward broadly capable AI systems is even more difficult still because we do not yet know the fundamental architectures that may drive such systems.This decade has seen significant milestones in AI research realized [2][3][4][5], and the realization of these milestones has left many to perceive the rate of AI progress to be increasing.This perceived increase in the rate of progress has been accompanied by substantial increases in investment, as well as increased public and governmental interest.Consequently, there is a growing group in the AI strategy research community that is working to measure progress and develop timelines for AI, with significant effort focusing on forecasting transformative AI or human-level artificial intelligence (HLAI) 1 .Efforts to these ends, however, are not unified and the study of AI forecasting more broadly 1 We consider forecasts for human-level machine intelligence, high-level machine intelligence and artificial general intelligence to be equivalent to forecasting HLAI.does not appear to be directed at a well understood objective.Only one previous study has proposed a framework for forecasting or modeling AI progress [6].This paper outlines an alternative to that previous framework that utilizes both judgmental, statistical and data driven forecasting techniques as well as scenario analysis techniques.
To be certain, efforts to forecast AI progress are of paramount importance.HLAI has the potential to transform society in ways that are difficult to anticipate [7].Not only are its impacts difficult to imagine, but the notion of HLAI itself is ill-defined; what may be indicative of humanlevel intelligence to some may not be sufficient to others 2 , and there is no definitive test for humanlevel intelligence 3 .This has lead studies concerned with forecasting AI progress or HLAI to focus on the replacement of humans at jobs or tasks [9,10].The lack of an objective definition for HLAI is due in part to the fact that we do not know how to create it.In theory, HLAI could be instantiated by one algorithm [11] or constructed by combining different components [12].In order to adequately address this and other unique challenges faced in forecasting HLAI, methods that integrate diverse information and a variety of possible paths are required.
The necessity of planning for HLAI is obvious.It is also plausible, and perhaps even likely, that AI will have severe transformative effects on society without reaching human-level intelligence.A formal description of the extreme case for such a scenario is Drexler's notion of comprehensive AI services (CAIS) [13].Thus, for the purpose of ensuring that AI is developed to do the most good possible for humanity, we identify the primary task of AI forecasting to be that of forecasting transformative AI 4 .We define transformative AI to be any set of AI technologies that has the potential to transform society in ways that dramatically reshape social structures or the quality of life for social groups.
Here, we take the position that AI forecasts solely in the form of timelines (dates given by which we should expect to have developed transformative AI) are undesirable.To address this issue we propose a new AI forecasting framework along with a new scenario mapping technique that supports the framework.Independently, the framework and the new method each constitute novel contributions to the body of knowledge.However, together the framework and new technique demonstrate a holistic rethinking of how we forecast AI.It is this new perspective that we believe to be the paper's most significant contribution.
In the following pages the paper proceeds by first examining related literature.We do not consider the broader body of literature for the relevant topics, rather we focus only on the salient elements.After outlining scenario planning techniques, we move to propose a new subclass of scenario mapping techniques.Next, we propose a new method as part of this subclass which we call judgmental distillation mapping.This new method is then described as a critical component of the new AI forecasting framework.Following this description of the framework, we discuss strengths, weaknesses, the implications of practice and the implications on future research in AI forecasting.
We conclude by summarizing the key ideas and recommendations.

Background
This section examines several bodies of literature relevant to the holistic framework being proposed.This literature review is by no means comprehensive, and, due to the large number of academic disciplines and techniques covered, a more extensive literature is suggested for future work.
We consider the research topics of forecasting, technology forecasting, scenario analysis, AI forecasting as well as a brief discussion of digital platforms.

Forecasting
Big Data Cogn.Comput.2019, 3, x FOR PEER REVIEW 3 of 26 Forecasting techniques are commonly broken down into two broad classes: judgmental methods and statistical methods [14].Statistical methods are preferred for most forecasting applications and can range from simple extrapolations to complex neural network models or econometric systems of simultaneous equations [15].However, statistical methods perform poorly in cases with little or no historical data, cases with a large degree of uncertainty and cases involving complex systems [16].In such situations it is common to fall back on judgmental techniques.In this subsection we will forgo any discussion of statistical methods to focus on the different judgmental techniques and the challenges of expert selection.
Surveys are likely the most widely used judgmental technique.They solicit expert opinion from multiple experts without interaction between them.This technique is widely used because it is straightforward to implement and relatively inexpensive [1].Challenges to this method include sampling difficulties, especially those due to nonresponses.The Cooke method (or the classic method) of assessing the quality of expert judgements for expert elicitation comes from the field of risk analysis [17].It is a very powerful technique that involves the inclusion of calibration questions to calibrate the experts' forecasts so that they may be weighted during aggregation [18].
The Delphi technique was developed at the Rand Corporation in the 1950s at the same time as the development of scenario planning methods [19].This approach involves a group of experts participating in an anonymized forecasting process through two or more rounds [1].Each round involves answering questionnaires, aggregating the data and exchanging the summarized results and comments.Expert participation, expert selection and the falloff rate of participants over iterative survey rounds are the primary challenges.The Delphi technique is powerful and versatile, with the capability to be used for scenario building exercises as well as forecasting, and the flexibility to support large groups of experts with small modifications [20].Despite its wide use for over a half century, there are still many questions about fundamental issues of its effectiveness for certain situations5 [21,22].
Prediction markets are exchange traded markets intended for predicting the outcomes of events.They rely on a platform that allows people to make trades depending on their assessment of these outcomes.Prediction market contracts are binary options that are created to represent a forecasting target and then traded through the market.During trading, the market price of a contract adjusts dynamically to account for participants' predictions and is used as an indicator of the probability of these events.This incentivizes participants to be as accurate as possible in order to receive the most gain while allowing for aggregation over an arbitrarily large market.The free market tends to collect and aggregate predictive information well due to the strong economic incentives for better information.Consequently, prediction markets often produce forecasts that have lower prediction error than conventional forecasting techniques [23].Green et al. performed a comparison of the Delphi technique and prediction markets, finding that, when feasible, prediction markets have some advantages, but that the Delphi technique was still generally underutilized6 [24].However, the advantages of each technique were also dependent on the problem.Prediction markets performed better for short-term, straightforward problems whereas the Delphi technique was useful for a broader range of problems and for high uncertainty situations.
Superforecasting is a recently developed technique that utilizes groups of forecasting experts, i.e., superforecasters, in combination with advanced aggregation techniques to generate forecasts.Superforecasting has been demonstrated to be more accurate than prediction markets and to forecast certain types of targets (e.g.geopolitical events) better than any other methods [25].The technique was developed using forecasting tournaments for a competition for the US' Intelligence Advanced Research Projects Activity (IARPA).The project was funded for the purpose of developing new methods in order to improve the US intelligence communities forecasting abilities [26].However, superforecasting is not suitable for all forecasting problems.Particularly, it is ill-suited for predictions that are either entirely straightforward and well suited for econometric methods, or for predictions that are seemingly impossible.It is also not suitable for existential risk applications [27].Furthermore, while it may be one of the most powerful forecasting methods available for near-term forecasts, it still is not able to make forecasts any better than a coin toss for events over five years in the future 7 .
Combining different types of forecasts that draw from different information sources can be a powerful technique for forecasting when there is significant uncertainty about the situation or uncertainty about the different methods [14].Another powerful technique can be the adjustment of statistical forecasts using expert judgment, particularly in cases of high uncertainty where domain expertise is critical and environments are poorly defined [28].In such cases, structured judgmental adjustment can be a very powerful technique as long as efforts are made to counter cognitive biases [29].
Scenario planning methods are sometimes considered an adjunct forecasting method and scenarios are commonly employed to deliver the results of forecasts to decision makers [14].However, substantial work has been conducted considering their practical use in improving decision making under uncertainty [30,31].They are considered an essential technique in the technology forecasting and management literature [1], thus, we devote an entire subsection to them in the following pages.
Goodwin and Wright examine both statistical and judgmental forecasting methods in their ability to aid the anticipation of rare, high-impact events [21].They find that while all methods have limitations, it is possible to combine dialectical inquiry and components of devil's advocacy with the Delphi technique and scenario planning techniques to improve the anticipation of rare events.In their comparison of techniques 8 they consider a number of judgmental methods including expert judgment, structured judgmental decomposition, structured analogies, judgmental adjustment and prediction markets, as well as the Delphi technique and scenario planning.
Selecting experts is a challenging but necessary task when using any of these judgmental forecasting techniques.The first step in identifying experts is to identify the range of perspectives that will be needed in the study [1].Researchers typically want to prioritize the most knowledgeable experts for vital perspectives first; less vital viewpoints can often times use less knowledgeable experts or substitute secondary sources for expert opinion.Researchers should also be cognizant of possible sources of experts' biases when selecting experts and analyzing their responses.Some significant attributes include a broad perspective relating to their knowledge of the innovation of interest, a cognitive agility for being able to extrapolate from their knowledge to satisfy future possibilities, and uncertainties and a strong imagination [32].There is also the question of how many experts one needs for a study.This is commonly dependent on many factors, including the type of the study, the technology of interest and the scope of the study.Sampling diverse populations can lead to many issues, however, when it is necessary, documentation for the particular type of study commonly addresses these issues [33].

Technology Forecasting
Technology forecasting is a challenging task and the body of literature concerning this topic is very broad.To be certain, there is not a well-developed field of study that directly concerns the forecasting of future technologies.Much of what is considered here as technology forecasting literature is focused on technology management, and, consequently, many of the techniques are intended to aid in organizational management and planning.
A wide variety of methods are used for technology forecasting, including both statistical and judgmental techniques.Other techniques are also used, some of which are unique to technology 7 Tetlock considers experts no better than normal people at forecasting political events. 8The paper includes a very valuable table comparing these different techniques for anticipating rare events.forecasting.Innovation forecasting techniques can be used for mapping scientific domains that rely on bibliometric analysis [34].Tech mining is a similar technique that harnesses data mining methods to extract information from patent databased and the internet for the purposes of innovation forecasting [35].Due to the substantial uncertainty, scenario analysis techniques are also widely used for strategic planning involving emerging technologies [1].This subsection does not revisit judgmental forecasting techniques discussed in the previous subsection, but focuses on techniques that have not yet been discussed.Scenario analysis is discussed in depth in the following subsection.
Assessing progress -particularly the rate of progress -is essential when developing any type of technology forecasting model.This is so because the naive assumption that historical trends can be extrapolated to the future is many times correct, and, consequently, trend extrapolation is a very powerful forecasting technique [1].Indicators are variables that can be used for extrapolation or for building statistical forecasting models because we believe them to be good for predicting future indicators ideally must adhere to three restrictions: 1) the indicator must measure the level of a technology's functionality, 2) the indicator must be applicable to both the new technology and to any older technologies it replaces and 3) there must be a sufficient amount of data available to compute historical values.In reality, many times indicators are not available which satisfy all of these requirements.In such cases efforts should be made to identify indictors that suffice as best as possible.
Social indicators can include economic factors, demographic factors, educational factors, etc., and they are thought to be analogous to a technology's functional capacity.
Technology roadmapping is a widely used and flexible technique that is commonly used for strategic and long-term planning [36].It is known to be particularly effective in structuring and streamlining the research and development process for organizations [37], but it can be used for planning at both an organizational level and a multi-organizational level.It is generally thought to consist of three distinct phases -a preliminary phase, a roadmap construction phase and a follow-up phase -and commonly uses workshops for the map generation phase [38].When applied, it often uses a structured and graphical technique that enables exploring and communicating future scenarios.However, its lack of rigor and heavy reliance on visual aids can also be seen as weaknesses [1].
Innovation forecasting is a term that is typically associated with the use technology forecasting methods in combination with bibliometric analysis [34].In general, bibliometric methods are powerful analysis tools for understanding the progression of science.Such methods have been used for the mapping of this progression in different scientific disciplines for several decades [39].Maps of relational structures present in bibliometric data are useful for visualizing the state of research within the domain(s) of interest and can lead to insights regarding future research directions and geopolitical issues [40].
Tech mining is another notion that is frequently associated with innovation forecasting and management [35] While we have focused here on judgmental forecasting techniques and other techniques for technology forecasting, there is evidence that suggests that extrapolation and statistical methods are better for forecasting technological progress [41].Studies have found that technology forecasts developed using statistical methods were more accurate than those developed from other methods, with forecasts about autonomous systems and computers being the most predictable [42].However, there is certainly not agreement on this topic.Brynjolfsson and Mitchell conclude that "simply extrapolating past trends will be misleading, and a new framework is needed," 9 [43].The holistic perspective proposed here attempts to provide a new framework.

Scenario Analysis
Scenario analysis is a term used in technology management literature to refer to scenario planning techniques when applied in the context of technology and innovation forecasting [1].People use scenario analysis naturally by thinking in terms of future scenarios when making most decisions involving uncertainty in everyday life.It is also a very effective technique for decision making processes in more complex situations [44].Scenario methods are rooted in strategic planning exercises from the military in the form of 'war game' simulations, or simply wargames.Wargames are a type of strategy game that have both amateur and professional uses.For amateurs they're used for entertainment, with some of the earliest examples being the games of Go and chess.Fantasy roleplay games such as Dungeons & Dragons are also derived from wargames and used for entertainment.Professionally, wargames can be used as a training exercise or for research into plausible scenarios for highly uncertain environments such as those encountered on battlefields during wartime [45].Events in World War II, such as the allied preparations for D-Day, made clear to military commanders the value of wargames and scenario techniques.Following the war, during the 1950s and 1960s, new scenario techniques were independently developed in both the United States and France.In the United States, the methods were developed at the Rand Corporation, a research and development venture of the US Air Force.In France, the techniques were developed for public planning purposes.Although developed independently, these two schools eventually led to the development of very similar scenario techniques.
Scenario analysis, as it is known today, typically involves the development of a number of different scenarios of plausible futures.It is most widely thought of as a qualitative technique for the purposes of strategic planning in organizations [46].Proponents of this thinking often consider scenarios as an aid for thinking about the future, not for predicting it.However, a rich body of literature has developed over the years, and many quantitative and hybrid techniques have also been shown to be practically useful [20].Here we describe three schools of scenario techniques: the intuitive logics school, the probabilistic modified trends (PMT) school and La Prospective, a.k.a. the French school.We attempt to outline these different schools below.
The most prominent of qualitative methods, having received the most attention in the scenario planning literature, is the intuitive logics school [20].After being developed by at the Rand Corporation in the 1950s and 1960s, it was popularized from its use by Royal Dutch Shell in the 1970s -it is sometimes referred to as the 'Shell approach' [19].This school of methods is founded on the assumption that business decisions rely on a complex web of relationships including economic, technological, political, social and resource related factors.Here, scenarios are hypothetical series of events that serve to focus attention on decision-points and causal processes.While such scenario planning techniques are very useful for business purposes, alternative scenario planning techniques can be used for much more than investigating blind spots in organizations' strategic plans [47].
The most common of quantitative methods is considered to be the PMT school, which also originated at the Rand Corporation during the 1960s [20,48].This school incorporates two distinct methodologies: trend-impact analysis (TIA) and cross-impact analysis (CIA) [19].TIA is a relatively simple concept which involves the modification of extrapolations from historical trends in four relatively simple steps.CIA attempts to measure changes in the probability of the occurrence of events which could cause deviations from extrapolated trends through cross-impact calculations.The primary difference between the two techniques is the added layer of complexity introduced in CIA during the cross-impact calculation.
The two schools described above may do well to illustrate qualitative and quantitative scenario techniques, but they are by no means an exhaustive description of this dichotomy of scenario 9 This statement was made in the context of forecasting the automation of unique human tasks given the capabilities of current and future AI technologies.planning methods.Another way to think of qualitative and quantitative scenarios is as storylines and models.The former captures possible futures in words, narratives and stories while the latter captures possible futures in numbers and rules of systems' behaviors.Schoemaker notably suggests that the development of quantitative models is an auxiliary option for assisting in making decisions, whereas the development of scenarios is the purpose of the activity [49].Hybrid scenario techniques attempt to bridge the gap between methods which rely on storylines and models.
La Prospective is a school of hybrid scenario techniques that emerged in the 1950s in France for long term planning and to provide a guiding vision for policy makers and the nation [20].This school is unique in that it uses a more integrated approach through a blend of systems analysis tools and procedures, including morphological analysis and a number of computer-aided tools [19].Although it arose independently, this school can also be seen to a large extent to combine the intuitive logics and PMT methodologies.A full review of scenario planning literature is beyond the scope of this work, but we believe that these simple characterizations to be sufficient for the purpose of this work.

Scenario Analysis for Mapping
Traditional qualitative scenario planning techniques certainly have a role in assisting decision makers of organizations and other stakeholders involved in the development and governance of transformative AI.However, such techniques can do little to map the plausible paths of AI technology development due to the large space of possible paths.Traditional quantitative methods certainly have a role in some organizational decisions as well.However, while they are commonly sufficient for strategic decision making, they typically fall short for understanding and informing design decisions of complex systems.
Over the past two decades, the use of scenario analysis techniques for mapping complex systems, complex environments and complex technologies has increased [50].Particularly, we focus on three such techniques.The first is a relatively obscure method that has seen little practical application, yet it has significant potential for mapping the paths of possible futures for which there are high levels of uncertainty [20].The second originated as a way to represent social scientific knowledge through directed graphs, and has since become a common method for scenario analysis in multi-organizational contexts [51].The third extends the second by making those methods computable for quantitative forecasting, but also has practical uses in a large number of applications across various other domains.Each of these techniques offers insight that contributes to the holistic framework proposed here for forecasting transformative AI.
Scenario network mapping (SNM) is a qualitative scenario technique that was proposed to improve upon existing methods by including a substantially larger number of scenarios, each of which forms a portion of a particular pathway of possible events [52].This results in a network-like structure which is easily updated in the future with the addition, removal and repositioning of scenarios and their interactions in light of new information.A key feature to SNMs is their reliance on the holonic principle which implies that a scenario can also be decomposed into more scenarios.
Following the development of the scenario map, the scenarios can be refined further using causal layered analysis techniques [53].This technique benefits from larger groups of experts, because the structure of the network becomes more comprehensive with iterative refinement.In a typical SNM scenario building workshop, several hundred possible scenarios are generated, which are then typically reduced to 30-50 plausible scenarios that are used to create the scenario map.Due to this ability to accommodate a large number of plausible scenarios, we see potential for this method for its intended purpose as well as the potential for some derivative of it to be effectively used to identify a large number of possible paths to HLAI 10 .
Axelrod first introduced cognitive maps in the 1970s to represent social scientific knowledge with directed graphs [54].His work has since been extended to a variety of applications including scenario analysis.However, the psychological notion of cognitive maps -people's representations of their environments and mental modes -comes from two decades earlier [55].Cognitive maps are effective for facilitating information structuring, elaboration, sequencing and interaction among participants or stakeholders [51].They are sometimes thought of as causal maps because of the causal network of relationships represented in the nodes and edges.Here nodes can be thought of as scenarios, and the edges describe the causal relationships between them.
Fuzzy cognitive map (FCM) modelling is another hybrid scenario technique that can better integrate expert, stakeholder and historical data through the development of scenarios that assist in linking quantitative models with qualitative storylines [50,56].FCMs were first proposed by Kosko in the 1980s as a means for making qualitative cognitive maps -used for representing social scientific knowledge [54] -computable by incorporating fuzzy logic.While effective for scenario analysis, FCMs are used generally for decision making and modeling complex systems, and they have a wide variety of applications in multiple domains ranging from online privacy management to robotics [57].
Simply, we can think about FCMs as weighted directed graphs wherein the nodes are fuzzy 11 and representative of verbally described concepts while the edges are representative of causal effects.

Using Expert Opinion for Scenario Analysis
Virtually all scenario analysis techniques use expert opinion in some way, and there are various ways in which expert opinion is elicited for scenario generation.These techniques include interviews, panels, workshops and the Delphi technique [20].Many times specific techniques rely directly on the methods for elicitation of expert opinion being employed.For example, the proprietary Interactive Cross-Impact Simulation (INTERAX) methodology relies the generation of a large database of the use of an ongoing Delphi study with close to 500 experts to maintain and update a database of approximately 100 possible events and roughly 50 trend forecasts.Based on six case studies, List suggests that for creating SNMs four half-day workshops with 20 experts is roughly optimal [58].
However, some techniques don't rely specifically on one method for elicitation of expert opinion.
FCMs can be developed using expert panels, workshops or interviews.In the case of using interviews, where combining expert opinions is required, all experts' opinions can be treated equally or expert opinions can be weighted based on some assessment of confidence in expert's judgement [56].

AI Forecasting
The study of forecasting AI and HLAI is in its nascency, and much of the work has relied on expert surveys.The oldest of these dates to a survey conducted in 1972 at a lecture series at the University College of London [59].Since 2006 twelve more surveys have been administered [60].Such surveys have been used to generate forecasts in the form of timelines.The most recent work has aggregated probability distributions collected from participants [10,61,62].While the collection and aggregation of probability distributions from experts is an improvement upon previous studies on the topic, there remain many shortcomings in trying to quantify long-term forecasts from surveys of expert opinion, the foremost perhaps being the questionable reliability of experts [25].
The most rigorous of expert survey studies include four particular surveys which have been conducted since 2009, all pertaining to notions of artificial general intelligence (AGI) 12 .The first was conducted at the AGI-09 conference and found that the majority of experts believed that HLAI would be realized around the middle of the 21st century or sooner [63].The study also found disagreement among experts concerning the risks involved with AGI and the order of certain milestones (different human-level cognitive tasks) leading to the development of AGI.The next of these studies consisted of a survey that was distributed among four groups of experts at the conference on Philosophy and Theory of AI in 2011, at the AGI-12 conference, to members of the Greek Association for Artificial Intelligence and to the top 100 authors in artificial intelligence by number of citations in May 2013 [64].This survey questioned participants as to when they expected HLMI to be developed, and 11 They take a continuous value from zero to one, rather than a discrete value of either zero or one. 12Here, human-level machine intelligence, high-level machine intelligence, human-level artificial intelligence and other similar ideas are all considered simply as notions of AGI. reported the experts to give a 50% chance of HLMI being developed between 2040 and 2050.These experts further indicated that they believed superintelligence would be created between 2 and 30 years after the emergence of HLMI.Slightly over half of them believed that this would be a positive development while roughly 30% expected it to have negative consequences.
The next survey solicited the primary authors of the 2015 Neural Information Processing Systems (NeurIPS) conference and the 2015 International Conference on Machine Learning (ICML) [61].This study questioned participants on their forecasts of HLMI, but also included questions about a large number of specific tasks.All forecasters were asked for 10%, 50% and 90% probabilities, which effectively elicited a probability distribution from each.This was not new, but the analysis, including the aggregation of these probability distributions, was novel in the context of AI forecasting.The results indicated a median of 45 years until the development of HLMI, but, interestingly, a median of 120 years before all human jobs would be automated.The study also found Asian participants to have much earlier predictions that Europeans and North Americans.
The most recent expert survey was solicited at the 2018 International Conference on Machine Learning, the 2018 International Joint Conference on Artificial Intelligence and the 2018 Joint Conference on Human-Level Artificial Intelligence [10].Rather than focusing on notions of AGI, this study elicited five forecasts for different levels of transformative AI.It also included calibration questions -the first expert survey in the context of AI forecasting to do so.While the forecasts were closely aligned with the previous study, an improved statistical model was used.The use of a naïve calibration technique improved the explainability of the variability in the statistical model for the most extreme transformative AI forecasts.The results also indicated that forecasts from researchers at the HLAI conference were more precise and that this group exhibited lower levels of uncertainty about their forecasts.
A number of meta-analyses of AI forecasting studies have also been conducted.In 2012 and 2014, Armstrong and Sotala and Armstrong et al. assessed previous timeline predictions that had been incorrect [65,66].They proposed a decomposition schema for analyzing, judging and improving the previous predictions.Muehlhauser has also conducted examinations of timelines and previous AI forecasts [67,68].His studies offer the most comprehensive discussion of timelines for notions of AGI prior to the surveys conducted over the past decade.Regarding timelines, Muehlhauser concludes that we have learned very little from previous timelines other than the suggestion that it is likely we achieve AGI sometime in the 21st century.He further explores what we can learn from previous timelines and concludes with a list of ten suggestions for further exploration of the existing literature.AI Impacts 13 is a non-profit organization that is commonly thought to be the leading AI forecasting organization.It has conducted significant work discussing techniques, curating related content and organizing previous efforts for forecasting HLAI, among other research and curation efforts that are aimed at understanding the potential impacts and nature of HLAI.AI Impacts has contributed significantly to practical forecasting knowledge, even leading a major AI forecasting study in 2016 [61].
Recent work by Amodei and Hernandez presented a trendline for the increase in training costs for major milestones in AI progress between 2012 and 2018 [69].This trendline depicted exponential growth for the increase in the amount of training time required for achieving selected AI milestones; the training time doubled every 3.5 months.However, several critiques of this have emerged [70,71], the most compelling being that from a purely economic perspective the trend was unsustainable for a long period; the exponential rate for training costs was significantly greater than the exponential decrease in costs of compute.Despite these fundamental challenges to the trend, AI experts generally expect the trend to continue for at least 10 years [10].While not receiving as much visibility, other efforts have been made to plot or collect relevant data to measure the progress of AI research [72][73][74][75].
Despite these efforts, the best technology indicator 14 given the criteria previously discussed may be that of Amodei and Hernandez.
While most practical work on AI forecasting to date has relied on expert surveys and extrapolation, there are several important exceptions.Zhang and Dafoe recently conducted a large scale survey of non-expert opinion that was intended to assess the opinions of the American public regarding AI progress [62].Another study that was conducted in 2009 used the technology roadmapping technique in an attempt to create a roadmap for the development of HLAI [76].The results of this workshop depicted expected milestones on the path to HLAI arranged in a twodimensional grid of individual capability and sociocultural engagement.While organizers of the workshop were disappointed in what they perceived as a failure of the workshop to generate a straightforward roadmap [77,78], arguably 50% or greater of the tasks have been completed [79].
More recently, the Association for the Advancement of Artificial Intelligence and the Computing Community Consortium have completed a 20-year Roadmap for AI research [80].This roadmap was less ambitious than the earlier attempt, and focused on three major themes: integrated intelligence, meaningful interaction and self-aware learning.The AI Roadmap Institute 15 has also been created to study, create and compare roadmaps to AGI.Although the institute's efforts have resulted in the development of a roadmap, however, it does not concern technical elements as much as social elements.
Another relevant body of research concerns risk analysis.Significant work has been conducted regarding existential risks, and, particularly, the risks posed by AI and superintelligence 16 .In 2017 Baum created a comprehensive survey of AGI projects that includes a mapping of all relevant stakeholders at the time [82].Barrett and Baum conducted two studies during 2017, one which focused on the use of fault trees and influence diagrams for risks analysis, and the second which considered expert elicitation methods and aggregation techniques as well as event trees, including a probabilistic framework [83,84].Also in 2017, Baum et al. examined techniques for modeling and interpreting expert disagreement about superintelligence [85].
Recent work evaluated the methods currently used to quantify existential risk, considering both statistical and judgmental forecasting methods [27].This study found that while there were no clear 'winners,' the adaptation of large-scale models, the Delphi technique and individual subjective opinions (when elicited through a methodologically rigorous process) had the highest potential.
Furthermore, the authors concluded that surveys met all of the criteria to an acceptable degree and that fault tress, Bayesian networks and aggregated expert opinion were all well suited for quantifying AI existential risks.Prediction markets and superforecasting were not found to be especially suitable in general, or for AI risks specifically.
Other recent work by Avin has considered AI forecasting from the futures studies perspective including consideration of the use of scenario planning and wargaming as well as standard judgmental and statistical methods [86].Wargames 17 are particularly promising as they can be used for informing difficult and complex strategic policy decisions [87].As mentioned earlier, wargaming can serve two valuable purposes in preparing organizations for futures involving a large degree of 14 An example of a social indicator is depicted in Figure 3 (this example is not fundamentally constrained by economic forces). 15www.roadmapinstitute.org 16For the purposes of forecasting we do not consider superintelligence, an intelligence explosion or their ramifications [80].However, in order to forecast HLAI it is necessary to forecast the requisite technologies for its realization.Consequently, forecasts may have some implications on the nature of broadly capable systems that can inform scholarly work on these topics, but we do not believe their consideration to be under the purview of the academic study of forecasting. 17For the purposes here, professional role-playing games or government simulation games are considered as wargames.
uncertainty -training and research -and has been suggested by Avin (in the form of an AI scenario role-play game) as a valuable tool for both of these purposes in the AI strategy field.Furthermore, wargaming can be used in a model-game-model analysis framework to iteratively refine different models for how certain future scenarios may unfold [88].The work of Beard et al., Barrett and Baum, and Avin represent the only known work to explore the possibilities for judgmental forecasting techniques 18 (other than surveys) and scenario planning techniques for AI strategy purposes in any depth [27,83].
Assessing progress in AI is crucial in order to use extrapolation or other statistical forecasting techniques that require historical data.Consequently, a substantial amount of work has considered theoretical aspects of assessing and modeling AI progress for different ends [89][90][91].This discussion focuses on some recent efforts and other notable contributions.In 2018, Martinez-Plumed et al.
proposed a framework for assessing AI advances using a Pareto surface which attempted to account for neglected dimensions of AI progress [92].More recently, in 2019, Martinez-Plumed and Hernandez-Orallo built on previous work on item response theory to propose four indicators for evaluating results from AI benchmarks: two for the milestone or benchmark; difficulty and discrimination, and two for the AI agent; ability and generality [93].Hernandez-Orallo has written extensively about measures of intelligence intended to be useful for intelligences of all substrates that allow for our existing anthropocentric psychometric tests to be replaced with a universal framework [94].In contrast to these studies, a measure of intelligence has been proposed by Riedl that is based on creativity [95].The topic has also garnered mainstream attention with a workshop being dedicated to it in 2015 [96], and work on it being featured in Nature in 2016 [97].Although there is no consensus on how to measure AI progress or intelligence, it is clear that simple measures which can be represented in a small number of dimensions are elusive.
Work by Brundage has attempted to develop a more rigorous framework for modeling progress in AI [6].In it he suggests that this type of rigorous modeling process as being a necessary precursor to the development of plausible future scenarios for aiding in strategic decision making.To these ends he proposes an AI progress modeling framework that considers the rate of progress in hardware, software, human input elements and specialized AI system performance.In this work, Brundage considered indicators and statistical forecasts as being fundamental for modeling AI progress.However, later efforts by Brundage began to try and integrate scenario planning and judgmental forecasting techniques into formal models (i.e.agent-based models and game theory) [98].While this later work did not result in the proposal of a rigorous framework like his earlier effort, we believe that it indicates that the integration of various techniques is necessary for adequately modeling and forecasting AI progress.Moreover, it was successful in identifying numerous challenges posed by such an integration.The proposed framework here draws from this previous work in attempts to address these challenges.
In the AI governance research agenda, Dafoe discusses the notion of mapping technical possibilities, or the technical landscape, as a research cluster for understanding possible transformative futures [7].He also notes the important role of assessing AI progress and modeling AI progress.Separately, he discusses AI forecasting and its challenges and also includes a desiderata for forecasting targets.This paper addresses the issues of generating a mapping, and the task of forecasting events comprising this mapping to the greatest degree that we're able to, as the AI governance research agenda prescribes.The work of Brundage discussed in the previous paragraph is the only other known work to consider an integrated and rigorous methodical approach to the specific problem of AI forecasting 19 .This study goes further by building on the need for a mapping described by Dafoe and by considering a broader range of forecasting and scenario analysis techniques to develop a holistic forecasting framework. 18There are ongoing efforts to use prediction markets for AI forecasting as well as to develop a new type of forecasting platform for AI forecasting. 19Brundage did not consider any of these techniques from a forecasting perspective in the way we do.
Particularly, his work focuses on applying these techniques directly to a model of AI governance.

Summary of the Related Literature
There are generally thought to be two types of forecasting techniques: judgmental and statistical.
Statistical methods are typically preferred when data is available, however, in cases for which data doesn't exist, is missing or for which there are other inherent irreducible uncertainties, judgmental techniques are commonly the best or only options.AI forecasting falls into the later of these categories.Work from Brundage has previously proposed a general framework for modeling AI progress [6], and later work attempted to integrate scenario analysis, expert judgment and formal modelling [98].Although most previous studies using judgmental techniques have used expert surveys, there is new evidence that other techniques are more appropriate for this problem [27].Other potentially valuable techniques, such as tech mining, bibliometric analysis or mapping the technical possibilities have been suggested but have not been attempted in the literature.This study goes further than previous work by considering a holistic framework which attempts to utilize statistical techniques as best as possible, and to augment their use by including judgmental techniques and scenario analysis techniques.We ultimately take a step beyond forecasting to suggest exercises for strategic management and planning.

Judgmental Distillation Mapping
Section 2.3.1 highlighted three scenario analysis techniques that have mapping qualities.We refer to these techniques collectively as scenario mapping techniques due to two significant properties they share: 1) they do not have a strict limit on the number of scenarios they can accommodate and 2) they represent the scenarios as networks with directed graphs (i.e.maps).
Although only three have been identified, other approaches are possible.Here, we draw from the existing techniques to propose a new scenario mapping technique which also exhibits the same mapping characteristics as the techniques described in section 2.3.1.We refer to the proposed technique as judgmental distillation mapping (JDM).Figure 1   The map created (see Figure 2) is distilled from larger input maps20 comprised of both historical or tech mining data, and scenarios developed either by previous rounds of the judgmental distillation process, through interviews or through a technical scenario network mapping workshop 21 .The scenario map (i.e. the graph) is equivalent in characteristics to that of an FCM, with advanced technologies being represented as nodes.The input nodes represent technologies for which forecasters believe it tractable to use existing forecasting techniques to forecast.The 2nd order and greater nodes in the maps cannot be forecast directly using powerful, traditional techniques such as Delphi, prediction markets or superforecasting.However, these methods are suitable for the first order nodes as long as they are used in a fashion that generates the probability distributions that are necessary for computing the timelines for higher order technologies.These timelines are generated using Monte Carlo simulation and the causal relations between technologies, as determined by expert judgment given the input data.As the final outcomes of transformative AI technology are unknown 22 , the resulting map is able to accommodate a variety of outcomes.Figure 2   JDM is a resource intensive technique that requires a substantial degree of expertise from the forecaster(s) as well as a large number of participating experts.The primary burdens of decomposition and aggregation fall to the facilitator of the process, and, as noted, this provides significant opportunity for facilitators to exercise their own judgment.If not making all input data available to all experts, substantial effort would be required in delegating portions of the input data, and subsequently developing individualized interview questions and questionnaires for the participating experts.In order to obtain the best results, it is likely best to employ a team for the entire JDM process.Moreover, the best results from the process may be achieved with long periods dedicated to judgmental distillation.
There are three primary inputs for the JDM process: historical data and statistical forecasts, judgmental data and forecasts, and scenarios.Historical data, statistical forecasts and scenarios should initially be in the form of mappings, but such inputs will frequently be deconstructed by the forecasters into forms easily digestible for analysis.Efforts to avoid information overload should be prioritized by forecasters when presenting the information to experts.The questionnaire and interview questions involve asking experts to respond to or comment on scenarios, statistical forecasts, judgmental forecasts and the relationships between these input items.While an example of data that could be shown to interview candidates is shown in Figure 3 and described below, the majority of the JDM process may still be comprised of questions building on qualitative input data rather than quantitative input data. 23Monte Carlo simulation is used to generate the distributions found in 2b and 2c.Each distribution is sampled, and the sum of the samples proportional to the strength of the edge weights for each higher order node is taken as the resulting forecast for that node and that sample.A notebook for computing these distributions can be found here: www.github.com/rossgritz/research/.
Figure 3.This depicts a simple extrapolative forecast of a social indicator.This is an example of the type of quantitative information that can be provided to experts for adjustment, distillation and aggregation.When presented with this, experts could be asked whether they agree or disagree that this extrapolation is reasonable.If they disagree, they would be asked to explain how they disagreed and what they thought was a reasonable trend for the indicator presented in the figure.Based on these responses, they may also be questioned about economic realities governing the behavior of this indicator and whether they believed it was possible, even over a substantially longer timeframe, for these economic factors to be altered such that this the indicator may ultimately hit some of the major milestones depicted.They may also be asked questions raising concerns identified by other experts or questions as to why or why not AI research should be analogous to nuclear physics or rocket science.Careful consideration about the indicators and the questions to ask would be determined by the forecaster, or by a forecasting team.

A Holistic Framework for Forecasting AI
JDM was developed to integrate a large variety of forecasts into a single mapping that includes probabilistic timelines for its components.However, these results are more model than narrative and they focus on technological developments rather than economic, political, social or resource related factors.Moreover, JDM is a technique rather than a broader solution for forecasting, planning and decision making on a continuing basis.The proposed holistic framework utilizes JDM to leverage flexible combinations of powerful forecasting techniques in an attempt to provide a comprehensive solution.
The framework is depicted in Figure 4.In this figure, inputs are depicted as rectangles, required inputs are depicted as ovals and actionable forecasts are depicted as circles.As depicted in the legend, the elements of the framework can be thought to comprise three distinct groups of processes: input forecasts, JDM and strategic planning.The inputs are comprised of traditional forecasting and scenario analysis/mapping techniques.JDM is modified for the framework to generate two outputs, one directed back at the next iteration of inputs and the other directed at the strategic planning processes.These strategic planning processes then build on JDM forecasts by considering economic, political and technological aspects with both traditional scenario analysis techniques as well as a powerful existing scenario mapping technique.A strategic AI scenario role-playing game can be used for training as well as scenario refinement.The inputs to the framework are illustrated in Figure 4, and are consistent with the input requirements described for JDM.The quality of the inputs is expected to be strongly correlated with the number of experts, the time requirements and the input forecast quality of JDM.Thus, we anticipate tech mining, indicators (tech & social), interviews and survey results to all be relatively essential for obtaining a reasonable output given a reasonable amount of resources.The non-essential inputs depicted are SNM, the Delphi technique, superforecasting and prediction markets.The SNM input (scenario network mapping*) is not equivalent to the actionable SNM, but is a workshop technique focused on mapping the paths to strong AI [99] that utilizes a highly modified form of SNM specifically developed for this purpose.Alternately, the form of SNM utilized for strategic planning is consistent with the original intentions of the technique and is discussed at length below with the discussion of the strategic planning elements of the framework.
JDM was described thoroughly in the previous subsection.However, in this discussion we considered only the output of the mapping and its timelines.When incorporated into the framework, JDM assumes the dual role of both generating forecasts and informing future component forecasts 24     for iterative application of JDM in the holistic framework.In this way, the framework represents an ongoing forecasting process in which the judgmental distillation process has two objectives and two outputs; one for planning and one for continuing the forecasting process.The figure depicts the first output as moving left for informing actionable strategic planning techniques, and the second element as directing qualitative information right to be merged with updated indicators for developing new targets, forecasts and the next iteration of JDM.This second role performed by JDM, of providing feedback for future iterations, also has the effect of refining the previous forecasts and forecast targets.
Since the process of completing an iteration of JDM in the proposed framework is resource intensive, it may be realistic to iterate over longer time periods, e.g. on an annual basis.This may be more amenable to expert participation because the frequency would be less burdensome to the experts, and with a relatively modest incentive (e.g. a gift card or lodging reimbursement for adding a workshop to an annual conference itinerary) participation may pose less of a challenge than other elements of the framework.
The input forecasts and the JDM process can be thought to work in tandem to produce an updated mapping with timelines on a continuing basis.It is likely that this cyclic pair of processes is sufficient to satisfy the requirements of the AI strategy community for a mapping and timelines [7].
However, we don't have to stop here.The purpose of forecasting is to inform decision makers such that they can make the best decisions, and, in order to do this we can draw from the methods discussed in the literature review to extend the JDM results so that they are most effectively utilized.
The weakness of the mapping and timelines resulting from JDM is their heavy focus on the technology.In order to make the best strategic decisions, numerous factors must be considered (e.g.economic, political and technological).As discussed earlier, scenario analysis techniques are used to incorporate such factors in the planning and decision making processes 25 .Despite being strongly influenced by scenario analysis techniques, JDM does not include consideration of political factors, social factors or resource related factors.Thus, the strategic planning portion of the framework builds on the mapping and timelines for future AI technologies produced from JDM by considering economic, political, social and resource-related factors.It does this by two methods; one for highlevel planning (intuitive logics scenarios) and another for exploring granular scenarios (scenario network mapping scenarios).No AI experts are required for these strategic planning elements in the framework.
The use of intuitive logics scenario planning is inspired by their extensive record of success in business applications.Due to their widespread use and popularity, such scenarios may be more acceptable for use by traditional policy professionals not familiar with AI strategy or advanced scenario planning and technology forecasting methodology (their reliance on narratives makes them more palatable for such persons).In this case the intuitive logics scenario planning technique would be used as intended to address and plan for uncertainties that are not implicit in the forecasts.This will likely lead to three or four high-level scenarios which can be used for guiding planning and decision-making processes directly.Scenario network maps may have some advantages over intuitive logics scenarios, however, they each can play important roles.Intuitive logics scenarios may be suitable for public dissemination whereas scenario network maps may be too granular and include sensitive information that may not be suitable for public release 26 .Because intuitive logics scenarios may be more appropriate for politicians or other stakeholders who are not familiar with more advanced scenario planning techniques, they are sufficient for official reports from institutions 24 It is also possible that rounds in the JDM process be used to inform component forecasts during the process. 25While we only consider the use of scenario planning to improve upon the results of the technology maps and timelines generated from JDM, it is also possible to extend the framework to incorporate forecasts of economic and political events into a separate JDM process keeping the technology map fixed.This possible dual use of JDM underscores the power of the holistic forecasting framework.
utilizing any holistic forecasting framework.When such scenarios are substantiated by a rigorous forecasting methodology, as discussed here, they can be much more effective for affecting public policy decisions.
SNM is also used here, i.e. the strategic planning context, in the manner that it was intended.
However, the workshop technique will need to be modified from that detailed in the SNM manual [101] in order to incorporate the technology map generated from JDM.One of the unique advantages of SNM in this application is its use of the holonic principle.The holonic principle enables the deconstruction of complex scenarios even further.Thus, Figure 4 includes a self-referential process arrow to indicate the possibility of continuing the scenario decomposition process to the desired level.
This could be very useful due to the complex nature of this unique wicked problem [102].
JDM is also a flexible method, and within the holistic framework it can be used to forecast a large variety of AI forecasting targets.The output map would typically be expected to be similar to an FCM, comprised of somewhere between six and twenty nodes.However, there is no fundamental limit on the number of nodes in the map and it could be possible to retain the holonic principle from an input SNM through the distillation process to the output (or to utilize the holonic principle during the distillation process).Thus, JDM could be used to forecast automatability of classes of jobs and even particular jobs or tasks with in specific jobs by means of judgmental decomposition.The details of such a process are not discussed at length here, but it is important to realize that as a forecasting project proceeds to further decompose its targets the required resources would continue to increase.
So, while it may be possible to use the framework for forecasting the automation of individual tasks, it is likely not a reasonable pursuit for most organizations due to resource constraints.

Strengths and Weaknesses
One of the most obvious weaknesses of both JDM and the proposed framework are the heavy reliance of each on the use and elicitation of expert opinion.These methods may prove difficult to apply when access to experts is limited or biased (as in a single organization).Moreover, the resource requirements may be quite costly in the need for forecasting expertise.The JDM process requires a substantial amount of analysis on the part of the forecaster(s) for deconstructing, creating individualized expert questions and aggregating expert opinion into a single scenario map of AI technologies.The process, as envisioned here, is most likely better suited for teams when working on projects of any reasonable scale.However, the method proposed is flexible and could be revised so as to maintain the holistic framework while reducing reliance on expert judgment.It is unclear whether this would be desirable or not, but it is worth further examination.
The reliance of the proposed method and framework on expert judgment is also one of their greatest strengths.The literature review indicated that for forecasting problems concerning large degrees of uncertainty or for forecasts of rare and unprecedented events, statistical forecasting techniques do not suffice, and judgmental forecasting techniques are required [14,21].Furthermore, it has been suggested that only those working closely on advanced AI technologies such as AGI may be qualified to make forecasts for such technologies [103].However, the framework and method proposed here do not go so far as to remove all elements of statistical or data-based forecasting.
Rather, we believe that all resources should be utilized as best as possible.Thus, the holistic perspective focuses on judgmental techniques while using data-based statistical forecasting techniques to inform them.The framework and method are both inspired by a mixed methods approach to forecasting that utilizes both qualitative and quantitative judgmental methods.This mixed methods approach to the technique and the framework is another one of its strengths.

Implications for Practice
Implications for practice are straightforward and have been discussed to some degree in previous sections.However, they raise important questions about the feasibility of practical applications of this technique and framework.For one, the technique is resource intensive and requires a large number of experts for virtually all of the process.Skeptics may see the issue of expert involvement to immediately render the method inviable, and while we believe such a perspective may be extreme, the number or required experts is a credible challenge that should be addressed.
There are several ways to consider soliciting experts for participation and they depend on the nature of the organization pursuing the forecast.Academic organizations may have more trouble incentivizing experts while organizations like the Partnership for AI, or the companies which comprise it, may be able to leverage member organizations' or employees' cooperation to obtain expert opinion.It is also likely that motivated and well-funded non-profit organizations (e.g.The Open Philanthropy Project) could effectively solicit expert opinion by means of paying experts appropriately for their time 27 .Another considerable challenge is obtaining an appropriate sample of those actively working on relevant projects, and, based on the nature of the work being done, it may be desirable to intentionally collect biased samples [103].(This would be equivalent to weighting work being conducted at certain organizations.) Perhaps equally as costly for practical implementations of JDM and the proposed framework would be the requirement of forecasting expertise.It may be difficult to maintain full-time forecasting experts on the payroll of any organization, even one created specifically for the task of AI forecasting.
Limited mappings could be developed by a single forecasting expert, and this may be sufficient for demonstrating the viability of the concept.However, the most comprehensive and accurate results would likely be realized with forecasting teams.In such teams it may be more appropriate to primarily retain forecasting experts as advisors for management and consultation while employing early career forecasters 28 for the majority of tasks.

Implications for Research
AI forecasting is a nascent discipline, and, to date, no unifying document exists.While this work does not intend to be such a document, we do wish to draw attention to the need of the research community for one.This document presents a framework that relies on numerous techniques working harmoniously toward a single goal.Thus, research to improve the method and framework may be most effective through analysis of some of the contributing elements.Moreover, work is also necessary which explores this framework, improvements or variations thereof, and alternate ways to incorporate judgmental forecasting methods with statistical forecasting methods and scenario analysis methods (e.g.Brundage [98]).Here we outline some suggestions of this type for future work that could be elaborated upon in the form of an AI forecasting research agenda.A structured research 27 Records of annual DeepMind personnel spending suggest that £150 to £250 per hour may be reasonable.
If so, using 150 experts (e.g., 50 in each stage) for the JDM process would cost £22,500 to £37,500.While this amount is nontrivial, it is also not cost prohibitive outside of typical academic channels.Paying for experts' time may also be necessary for other components of the study also, e.g., using the Delphi technique.Academic efforts to utilize these forecasting methods may be able to solicit expert participation with lesser incentives given institutional prestige, or, with incentives structured in different ways such as those used by Grace [60] where one of each ten participants was given a larger monetary incentive than individuals would have been given. 28We estimate that talented qualified candidates could learn the skills necessary to lead the process in six to twelve months' time.Ideal candidates would have completed major courses of study in both sciences and humanities, and have experience developing AI models as well as a strong aptitude forecasting and futures studies.Particularly, for developing individualized questionnaires and for conducting technical interviews with experts, it is critical to have a sound understanding of the fundamental technologies being forecast.Also, to ensure the highest fidelity forecasts, it would likely be necessary to retain one senior forecasting expert to direct the research, to facilitate group processes and to conduct critical interviews with the most senior experts.
agenda with a coherent vision 29 for the forecasting space could act as the sort of unifying document needed for the AI forecasting research space.
This study has illuminated a large number of topics that do not seem to have received appropriate attention thus far in the study of AI forecasting.Other recent work has identified some of these topics as salient [27,86], however, the previous work has not gone so far as to suggest action to motivate progress in future research.To our knowledge, no literature review exists that is equal in scope to the one presented here with respect to AI forecasting 30 .We believe that going forward a major priority in the study of AI forecasting is the necessity of a large number of comprehensive literature reviews for narrow topics (e.g. the many techniques discussed here) in the context of how they may be used for the tasks involved in AI forecasting.We also see the need for a broad, comprehensive literature review 31 .These suggestions are mentioned first as they may be the lowest hanging fruit but also have significant potential for being very useful.
The literature review here found the body of existing work was lacking in studies that had compared forecasting methods.Of the studies that did, none of these compared superforecasting and none of these considered methods when used for the purpose of technological forecasting or for AI forecasting specifically.Moreover, work considering the Delphi technique found academic work assessing its viability to be lacking due to the excessive use of students rather than professionals and experts 32 .Since the literature review was not comprehensive, a focused and more extensive effort may illuminate valuable work that has not yet been uncovered 33 .Regardless, it is clear that significant future work on methods evaluation and comparison, particularly for the viability of various forecasting techniques in the context AI forecasting, are required in order to best determine how and when the wide variety of methods are suitable in this framework and when they are suitable for AI forecasting purposes more generally.Three methods are depicted in Figure 4 as being optional inputs for JDM in the framework: the Delphi technique, prediction markets and superforecasting.It may be that one of these is indeed superior for the majority of related tasks, or, that they each can serve certain purposes in a balanced capacity to achieve the best results.Priorities for future research include comparing these three methods directly, as well as comparing the suitability of these methods in the JDM process.Such comparisons are useful both in the context of AI forecasting as well as in other contexts.
Calibration is another topic related to judgmental forecasting techniques that could be helpful for AI forecasting if better understood.  2The method and framework presented here may contribute to a clear vision for the AI forecasting space, but further input is needed from others with experience in the field to iron out a unified vision.A workshop or meeting of persons working in the space would be useful. 30While the scope is broad, the depth of the literature review here is lacking. 31The literature review here may be a good start, but such a literature review is likely appropriate as an independent document as long as it is framed in a manner that emphasizes a unified vision for the study of AI forecasting.We envision something that could be seen to summarize the state of the field of study and act as a suitable document for informing those who want to begin research in this area. 32The Delphi technique is intended specifically for use with experts.Some studies with students have attempted forecasts of things such as college football, for which students may be considered experts, however, the vast majority of these studies did not [21].
As Gruetzemacher et al. has recently shown, a substantial proportion of AI experts' forecasts are poorly calibrated, such studies to improve and better understand calibration techniques could have a quick and nontrivial return for AI forecasting efforts.
Tech mining and bibliometric mapping have a huge role to play in the proposed framework, and likely in any AI forecasting framework.Brundage mentions the use of bibliometric methods for mapping inputs [6], however, no work is known which has pursued this suggestion.While the foremost priority is likely a review of the related literature, practical work is also desperately needed.
It is likely that these techniques must be used to some degree to demonstrate and/or validate the proposed method and framework, but a more extensive examination of these techniques should also be a priority.A large body of software exists for the mapping of science [104], however, each flavor produces a different result.We are uncertain as to what form of results will in fact be useful for judgmental distillation, or for alternate forecasting frameworks, and this is of critical interest for an initial inquiry or for numerous simultaneous inquiries.It may be that a number of techniques are valid and can be shown to one expert or different experts in the JDM process.It could also be that an interactive mapping platform is most valuable for judgmental distillation in that such a platform could enable active navigation through complex maps in three dimensions.If this is the case, and if the needs for AI mapping are not met with existing software, future work could also be necessary for developing, testing and refining a tech mining based AI mapping platform.Regardless, it seems imperative that work on these topics be prioritized.
Another topic of interest is that of relevant indicators of AI progress.The literature review here discussed a growing body of work in this direction, and this work is certainly desirable moving forward.The framework proposed here takes a slightly different perspective than a substantial portion of the work toward these ends in that it suggests value in a larger number of salient indicators as opposed to a smaller number of indicators.A larger number of indicators is more realistic for the high dimensional space of AI progress, and trying to reduce progress toward broadly capable systems to a small number of vectors ignores the fundamental uncertainty of the technology that we're trying to forecast.The indicator proposed by Amodei and Hernandez met the criteria for technology indicators relatively well [69], as did some of the other metrics that have been developed for measuring AI progress [93].Ongoing efforts toward the latter are likely sufficient at this time, but efforts to explore indicators like the former, or like the one depicted in Figure 3, while being difficult to identify, should be considered a prioritiy.There are a large range of possible outcomes and any forecasting framework must consider this.
Forecasting targets are another critical topic for research efforts.The desiderata presented by Dafoe is an excellent ideal [7], however, in practice it can be challenging to develop targets.Structured methodologies to develop these targets are highly desirable.Work on such methods is therefore a priority given the significance of including expert forecasts for adjustment in addition to statistical forecasts.For example, workshopping techniques or interview techniques to identify and refine these targets are sorely needed.Targets that suffice may be more realistic than ideal targets, and a list of minimum criteria may be useful in addition to Dafoe's desiderata.An initial effort to expound on Dafoe's work and to examine its strengths and weaknesses could be a simple yet valuable contribution now.Also of interest are the effects of combining statistical forecasts with judgmental forecasts and aggregated expert opinion forecasts for adjustment, or the effects of combining multiple forecasts of other variations.Work exploring these effects could be examined with or without the use of domain expertise and could have significant implications on how forecasters deconstruct and delegate questions to experts in the judgmental distillation process.Furthermore, the methods of determining forecasting targets' resolution may sometimes be ambiguous and techniques are necessary for objectively resolving forecasting targets.
Finally, efforts should be undertaken to evaluate both the method and framework proposed here.The method may be possible to evaluate objectively in contexts other than AI forecasting.Doing so may be a good step for confirming the viability of the method and the framework, however, evaluation should not be limited to toy cases.An alternate means of validation is to employ the method first in a preliminary fashion 34 , as for demonstrating viability, and then to pursue a full-scale implementation of the method and framework.Results from the former could be used for validating and refining the technique and framework through the inclusion of calibration targets ranging from one to three years.If this was done to pilot the proposed method and framework, validation of the method would be confirmed gradually over three years.If successful early on, then further resources could be justified moving forward if performance persisted.This would also have the added benefits of improving training for AI strategy researchers and professionals and improving the scenario planning capabilities of the strategy community.If timeline forecasts were equal to or less accurate than existing methods, over short time frames, then qualitative assessment of the benefits to the planning process would have to be considered also, and the forecasting framework could be modified.Other work should also be prioritized that assesses and validates the framework and method in other contexts which may not take as long, or, that works to refine and improve the framework and method.

Conclusions
The framework proposed here is not intended to be the solution for AI forecasting, rather, it is intended to illuminate the possibility of considering a holistic perspective when addressing the unique challenges of AI forecasting.It differs from previous work in its holistic perspective and through the development of a new method for judgmental distillation of the salient features of a diverse group of forecasting techniques.By incorporating expert judgment in addition to historical and mined data, it attempts to address issues of severe uncertainty inherent in AI forecasting, while still harnessing the power of statistical methods and scenario analysis in a novel manner through the proposed framework.
We have outlined both a new technique and a new framework in detail and with examples for practical application.We then discussed strengths and limitations of the technique and the framework.Finally, we discussed implications for research and practice, raising key concerns about issues for practice and outlining possible topics, as well as specific studies of immeadiate value, that could be extended to develop a research agenda in support of holisitic forecasting approaches.
Funding: This research received no external funding.
progress.There are two basic types of indicators that are of interest in technology forecasting: science and technology indicators and social indicators.Science and technology indicators, or simply technology indicators, are directly related to the progress of the technology of interest.Social indicators are intended to collectively represent the state of a society or some subset of it.Technology . It generally refers to a broad set of techniques which can be used to generate indicators from data.Porter and Cunningham discuss the use of innovation indicators for understanding emerging technologies, and propose nearly 200 such indicators.Here, we consider tech forecasting to encompass all bibliometric and scientometric techniques used for the purposes of technology forecasting.
depicts a diagram of the JDM process.

Figure 1 .
Figure 1.The judgmental distillation mapping technique.The technique is flexible and can be thought of as generally being comprised of iterative rounds of questionnaires and interviews intended to isolate a scenario map for which forecasts are generated through Monte Carlo simulation.
depicts a possible result of the JDM process.(This figure is not intended as a forecast, but rather as an example of what JDM could result in.Input distributions are randomly assigned using a normal distribution as opposed to aggregated, adjusted results from JDM).

Figure 2
Figure 2 23 .(a) A hypothetical judgmental distillation map is depicted.White ovals are inputs and light grey ovals are next generation (2 nd order) technologies.General intelligence is depicted in a stacked fashion to indicate the possibility of future technological scenarios in the model to be realized through the combination of different paths (i.e.adheres to the holonic principle).The links in the figure are representative of causal relationships and the weights for these links correspond to the strength of these relationships.Note that this figure is not intended to be a forecast, but rather an example of what the JDM process could result in.Input distributions are randomly assigned using a normal distribution.Actual input distributions would not be based on a normal distribution and would be aggregated from expert opinion rather than parameterized distributions.(b) A histogram depicting the results of a Monte Carlo simulation for the next generation adaptive learning technology.(c) A histogram depicting the results of a Monte Carlo simulation for the next generation natural language understanding (NLU) technology.

Figure 4 .
Figure 4.The proposed holistic framework for AI forecasting.Rectangular boxes denote inputs, ovals denote required inputs and circles denote actionable forecasts.Inputs to the framework must include scenarios and a mapping of indicators, however, the specific choice of these and the methods for obtaining them are flexible.