The SP Theory of Intelligence , and Its Realisation in the SP Computer Model , as a Foundation for the Development of Artificial General Intelligence

: The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model , is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial general intelligence’ (AGI). The SPTI, and alternatives to the SPTI chosen to be representative of potential foundations for the development of AGI, are considered and compared. A key principle in the SPTI and its development is the importance of information compression (IC) in human learning, perception, and cognition. More specifically, IC in the SPTI is achieved via the powerful concept of SP-multiple-alignment , the key to the versatility of the SPTI in diverse aspects of intelligence, and thus a favourable combination of Simplicity with descriptive and explanatory Power. Since there are many uncertainties between where we are now and, far into the future, anything that might qualify as an AGI, a multi-pronged attack on the problem is needed. The SPTI qualifies as the basis for one of those prongs. Although it will take time to achieve AGI, there is potential along the road for many useful benefits and applications of the research.


Introduction
The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model (SPCM), is a promising foundation for the development of artificial intelligence at the level of people, or higher, also known as 'artificial general intelligence' (AGI).
Readers who are not already familiar with the SPTI are urged to read the introduction to it in Appendix A, and perhaps other sources referenced there.
Because AGI is far from being achieved by any system, the focus of this paper is on foundations for the development of AGI, abbreviated as 'FDAGIs'.
In the paper, six alternatives to the SPTI, abbreviated as 'ALTs', are described and compared with the SPTI in terms of their potential as FDAGIs.
Although the SPTI scores well in comparison with the ALTs, it would be wrong, and quite unrealistic, to assume that all research effort should be switched into its development.Since there are many uncertainties between where we are now and, far into the future, anything that may come close to human intelligence, a multi-pronged attack on the problem is needed.And the SPTI qualifies as the basis for one of those prongs.
Although the quest for AGI, if it succeeds, is likely to take some time, any programme of research like this is likely to produce many potential benefits and applications at points along the road.

Presentation
The main sections of the paper are these:

•
In Section 2, six ALTs are described.
• Section 3 describes how the ALTs and the SPTI may be evaluated as potential FDAGIs.• Section 4 evaluates the ALTs and the SPTI as FDAGIs in terms of the criteria in Section 3. • Section 5 summarises how the SPTI compares with the ALTs, and how the ALTs compare with each other, in terms of the criteria in Section 3.

•
The paper concludes that the SPTI is indeed relatively promising as an FDAGI (Section 6).
Abbreviations are listed with their meanings after the Conclusion (Section 6), and the meaning of each one is also made clear where it is first used.
The appendices that follow the Conclusion (Section 6) should not be regarded as part of the main substance of the paper in Sections 1-6.They present background information in support of the main presentation.Although they describe some elements of previous publications in this programme of research, they are merely assisting the main presentation, and should not be regarded as self-plagiarism.

•
Appendix A introduces the SPTI, with pointers to where fuller information may be found.

•
Appendix B describes strengths of the SPTI in both intelligence-related and nonintelligence-related domains.

•
Appendix C describes topics related to the role of information compression (IC) in biology, especially HLPC, and in the SPTI.

•
Appendix D provides an entirely novel perspective on the foundations of mathematics.• Appendix E describes the benefits of a top-down, breadth-first research strategy with wide scope.

Six Systems That May Serve as FDAGIs
This section describes six systems, each of which is a potential FDAGI.

'The Society of Mind' by Marvin Minsky
In the book The Society of Mind [1], Marvin Minsky writes: "I'll call 'Society of Mind' this scheme in which each mind is made of many smaller processes.These we'll call agents.Each mental agent by itself can only do some simple thing that needs no mind or thought at all.Yet when we join these agents in societies-in certain very special ways-this leads to true intelligence."[1] (Prolog, p. 17), emphasis in the original.
Later, he writes: "Since most of the statements in this book are speculations, it would have been too tedious to mention this on every page.... Each idea should be seen not as a firm hypothesis about the mind, but as another implement to keep inside one's toolbox for making theories about the mind."[1] (Postscript, p, 323).
In short, the society-of-mind idea (SOM) is largely a council of despair: the human mind is too complicated for there to be any coherent theory of its structure and workings.Nevertheless, the SOM represents a distinct approach to the development of AI which is relevant to issues considered in this paper.

'Gato' from DeepMind
A team of researchers from the DeepMind company has created "A generalist agent" called 'Gato', described in [2].They say: "In this paper, we describe the current iteration of a general-purpose agent which we call Gato, instantiated as a single, large, transformer sequence model.With a single set of weights, Gato can engage in dialogue, caption images, stack blocks with a real robot arm, outperform humans at playing Atari games, navigate in simulated 3D environments, follow instructions, and more."While no agent can be expected to excel in all imaginable control tasks, especially those far outside of its training distribution, we here test the hypothesis that training an agent which is generally capable on a large number of tasks is possible; and that this general agent can be adapted with little extra data to succeed at an even larger number of tasks."[2] (p.2).
A "transformer sequence model" mentioned in the quote is "a transformer [deep] neural network akin to a large language model" [2] (Caption to Figure 2, p. 1), and 'transformer' means that different parts of the input data are given different weights, somewhat like human attention.
The paper [2] does not say explicitly that the authors are aiming for AGI, but it is clear that they see Gato as a stepping stone towards AGI: in [2] (p.18) they reference the book by Nick Bostrom [3] with its main focus on possible dangers from the development of AGI.

'DALL•E 2' from OpenAI
The OpenAI research organisation says on its website openai.com(accessed on 1 December 2022): "OpenAI's mission is to ensure that artificial general intelligence (AGI)-by which we mean highly autonomous systems that outperform humans at most economically valuable work-benefits all of humanity."We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome." The latest system in this quest is 'DALL•E 2', which in 2022-06-09 was described as follows on the OpenAI website openai.com/dall-e-2/(accessed on 1 December 2022): "DALL•E 2 is a new AI system that can create realistic images and art from a description in natural language.
"DALL•E 2 can make realistic edits to existing images from a natural language caption.It can add and remove elements while taking shadows, reflections, and textures into account.
"DALL•E 2 can take an image and create different variations of it inspired by the original.
"DALL•E 2 has learned the relationship between images and the text used to describe them.It uses a process called "diffusion," which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.
"In January 2021, OpenAI introduced DALL•E.One year later, our newest system, DALL•E 2, generates more realistic and accurate images with 4x greater resolution.
"DALL•E 2 is preferred over DALL•E 1 for its caption matching and photorealism when evaluators were asked to compare 1,000 image generations from each model."DALL•E 2 is a research project which we currently do not make available in our API.As part of our effort to develop and deploy AI responsibly, we are studying DALL•E's limitations and capabilities with a select group of users.Safety mitigations we have already developed include: Preventing Harmful Generations ... Curbing Misuse ... Phased Deployment Based on Learning ... "Our hope is that DALL•E 2 will empower people to express themselves creatively.DALL•E 2 also helps us understand how advanced AI systems see and understand our world, which is critical to our mission of creating AI that benefits humanity.".Overall, it is clear that OpenAI is aiming for AGI, and that DALL•E 2 is seen as a stepping stone in that direction.Hence, in the terms of this paper, it has potential as an FDAGI.

'Soar' from John Laird, Paul Rosenbloom, and Allen Newell
The 'Soar' cognitive architecture was first described in [4] and it is described at various points throughout Allen Newell's book Unified Theories of Cognition [5].Now, the most comprehensive description of Soar is in The Soar Cognitive Architecture by John Laird [6].And there is a more recent introduction to Soar in [7].
Some idea of how Soar is organised may be gained from Figure 1 which shows its memories, processing modules, learning modules and their connections.(Laird et al., 2017), which attempts to provide an abstract specification for cognitive architectures developed for human-like cognition.Soar and ACT-R are the two oldest and most widely used cognitive architectures, with ACT-R emphasizing cognitive modeling and connections to brain structures (Anderson, 2006).Laird (2021) provides a detailed analysis and comparison of Soar and ACT-R.
Section 1 provides an abstract overview of the architectural structure of Soar as depicted in Figure 1, including its processing, memories, learning modules, their interfaces, and the representations of knowledge used by those modules.Sections 2-8 describe the processing supported by those modules, including decision making (Section 2), impasses and substates (Section 3), procedure learning via chunking (Section 4), reinforcement learning (Section 5), semantic memory (Section 6), episodic memory (Section 7), and spatial-visual reasoning (Section 8).Sections 9-10 provide a review of the levels of decision making and variety of learning in Soar (Section 9), and analysis of Soar as an architecture supporting general human-level AI (Section 10).Following the references is an appendix that contains short descriptions of recent Soar agents and a glossary of the terminology we use in describing Soar.

Structure of Soar
Figure 1 shows the structure of Soar, which consists of interacting task-independent modules.There are short-term and long-term memories, processing modules, learning mechanisms, and interfaces between them.Working memory maintains an agent's situational awareness, including perceptual input, intermediate reasoning results, active goals, hypothetical states, and buffers for interacting with semantic memory, episodic memory, the spatial-visual system (SVS), and the motor system.In Laird's introduction to Soar [6], it is described like this: "Soar is meant to be a general cognitive architecture [8] that provides the fixed computational building blocks for creating AI agents whose cognitive characteristics and capabilities approach those found in humans [5,6].A cognitive architecture is not a single algorithm or method for solving a specific problem; rather, it is the task-independent infrastructure that learns, encodes, and applies an agent's knowledge to produce behavior, making a cognitive architecture a software implementation of a general theory of intelligence.One of the most difficult challenges in cognitive architecture design is to create sufficient structure to support coherent and purposeful behavior, while at the same time providing sufficient flexibility so that an agent can adapt (via learning) to the specifics of its tasks and environment.The structure of Soar is inspired by the human mind and as Allen Newell (Newell, 1990) suggested over 30 years ago, it attempts to embody a unified theory of cognition."[7] (p. 1).
Apart from being a cognitive architecture, Soar may of course also be a potential FDAGI.

'ACT-R' from John Anderson, Christian Lebiere, and Others
'ACT-R' (which stands for "Adaptive Control of Thought-Rational") is a cognitive architecture developed by John Anderson and Christian Lebiere, and others, which, like Soar, is inspired by Newell's writings on the need for unified theories of cognition.A relatively full account is in Anderson and Lebiere's book The Atomic Components of Thought [9].
On the ACT-R website act-r.psy.cmu.edu(accessed 15 December 2022), ACT-R is described as "A theory for understanding and simulating human cognition".In an overview of the system, the authors say: "Adaptive control of thought-rational (ACT-R ...) has evolved into a theory that consists of multiple modules but also explains how these modules are integrated to produce coherent cognition.The perceptual-motor modules, the goal module, and the declarative memory module are presented as examples of specialized systems in ACT-R.These modules are associated with distinct cortical regions.These modules place chunks in buffers where they can be detected by a production system that responds to patterns of information in the buffers.At any point in time, a single production rule is selected to respond to the current pattern.Subsymbolic processes serve to guide the selection of rules to fire as well as the internal operations of some modules.Much of learning involves tuning of these subsymbolic processes."[10] (Abstract).
A schematic representation of the ACT-R cognitive architecture is shown in Figure 2. As with Soar (Section 2.4), ACT-R is a cognitive architecture, but it is also a potential FDAGI.

'NARS' from Pei Wang
"Artificial General Intelligence" is a group of researchers, aiming for AGI, who hold an annual conference about their research.Details of all the conferences held to date may be seen via agi-conference.org(accessed on 25 November 2022).Although AGI is the goal of this research, it is acknowledged that the problem is difficult, and no real breakthrough has yet been achieved.
One of the systems that is associated with that AGI research is Pei Wang's Non-Axiomatic Reasoning System (NARS) (see, for example, Refs.[11][12][13], and sources referenced there).
NARS is an environment for Non-Axiomatic Logic (NAL): "This [NAL] logic is designed for the creation of general-purpose Artificial Intelligence (AI) systems, by formulating the fundamental regularity of human thinking at a general level."[14] (Location 94).Wang recognises that, to achieve AGI, there must be integration of different aspects of intelligence.
In [11] he discusses how reasoning and learning may be integrated within the NARS environment.
So NARS is a potential FDAGI, and is evaluated in this paper alongside other ALTs and the SPTI.

How the ALTs and the SPTI May Be Evaluated, Viewed as FDAGIs
This section describes how the ALTs and the SPTI may be evaluated, viewed as FDAGIs.
Although the ALTs are not normally seen as scientific theories, they represent ideas about the nature of AGI which seem to need the same kind of evaluation as may be applied to scientific theories.Hence, this section applies some widely-accepted conclusions from the philosophy of science about desirable or mandatory features in any scientific theory: meaning that, for the given theory, it should be possible to conceive of evidence that would show that the theory was wrong.

Evaluation Headings
In Section 4, which follows this main section, the ALTs and the SPTI are evaluated under the four headings shown here:

•
Simplicity.It may not be possible to measure the Simplicity of a system precisely in terms of bits or bytes of information, but a more informal assessment may nevertheless be useful.A large system should have a low score (0) for Simplicity, while a small system may have a high score (2), and something in between may be given a score of 1.

•
Power in Modelling Aspects of Human Intelligence.As with Simplicity, it may not be possible to measure Power precisely in terms of bits or bytes of information, but a more informal measure may nevertheless be useful.A system with high Power should have a high score (2), while a system with low Power should have a low score (0), and something in between may be given a score of 1.

•
Other Strengths or Weaknesses.In addition to Simplicity and Power, there may be other strengths or weaknesses of a given system that are relevant to the development of AGI.An assessment that emphasises weaknesses is scored −1, an assessment which is mainly about strengths is scored 1, and when strengths and weaknesses are at least roughly equal, or when there are none, the score is 0. Under this heading it is appropriate to consider anything that suggests that the given ALT or the SPTI, viewed as a theory of intelligence, might be unfalsifiable, as described above.

•
Combined Score.To simplify comparisons amongst two or more systems, a 'combined score' is calculated as the sum of the scores from the three headings above.

A Definition of Intelligence
Evaluating the Power of the ALTs and the SPTI in Section 4, below, means assessing their strengths or weaknesses in modelling human intelligence.
Arriving at a meaning for 'human intelligence' that will be widely accepted is difficult, witness the variety of definitions that have been advanced, many of which have been documented by Shane Legg and Marcus Hutter in [17].
As mentioned in Section 3, this section describes a definition of intelligence that may serve in the evaluation of any system, viewed as a potential FDAGI.This definition reflects the state of the SPTI as it has been developed to date.No doubt, other features will be added with further development of the SPTI.And, given the complexity of the concept of human intelligence, it should not be surprising to find that other sets of features, with at least equal claims for their validity, have been adopted by other researchers in other studies.
Some readers may object that the adoption of a definition of intelligence that reflects what the SPTI can do means the creation of an unreasonable bias in favour of the SPTI.That might be true if that definition was the only one to be adopted in the assessment of potential FDAGIs.However, as noted in the previous paragraph, alternative definitions of intelligence are recognised in this paper and given the same weight as the definition in Section 3.4, below.

Taking Account of the Distinction Between the 'Core' of Each System and How It May Be When It Is Enhanced Via Learning or Programming
In Soar, ACT-R, and the SPTI, there is a 'core' to the system which may be enhanced via learning or via programming.
In these cases, it is probably best for the core to be the primary focus of the evaluation.The main reasons are:

•
The core is relatively well defined but the way in which the core may be enhanced via learning or programming is less well defined.

•
The core comprises aspects of human intelligence, such as those described in Section 3.4, which are probably inborn and not learned.

•
It seems likely that features like those described in Section 3.4 are desirable features in any FDAGI.Likewise for other intelligence-related features with comparable validity mentioned near the beginning of Section 3.2.

The SPTI Core as a Definition of Intelligence
It seems likely that, although unsupervised learning is an important part of the SPTI, the features described here are, in people, inborn and not learned.These features, which may be seen as the central 'core' of human intelligence (Section 3.3), are described quite fully in [18] and more briefly in [19].
What follows is an outline of the SPTI definition of intelligence.There is more detail in Appendix B.1.

•
Information Compression.In view of substantial evidence for the importance of IC in HLPC [20], IC should be seen as an important feature of human intelligence.• Natural Language Processing.Under the general heading of "Natural Language Processing" are capabilities that facilitate the learning and use of natural languages.These include: -The ability to structure syntactic and semantic knowledge into hierarchies of classes and sub-classes, and into parts and sub-parts.

-
The ability to integrate syntactic and semantic knowledge.

-
The ability to encode discontinuous dependencies in syntax such as the number dependency (singular or plural) between the subject of a sentence and its main verb, or gender dependencies (masculine or feminine) in French-where 'discontinuous' means that the dependencies can jump over arbitrarily large intervening structures.-Also important in this connection is that different kinds of dependency (e.g., number and gender) can co-exist without interfering with each other.

-
The ability to accommodate recursive structures in syntax, and perhaps also in semantics.
• Recognition and Retrieval.Capabilities that facilitate recognition of entities or retrieval of information include: -The ability to recognise something or retrieve information on the strength of a good partial match between features as well as an exact match.
-Recognition or retrieval within a class-inclusion hierarchy with 'inheritance' of attributes, and recognition or retrieval within a hierarchy of parts and sub-parts.
• Probabilistic Reasoning.Capabilities here include: one-step 'deductive' reasoning; abductive reasoning; probabilistic networks and trees; reasoning with 'rules'; nonmonotonic reasoning; 'explaining away' meaning 'If A implies B, C implies B, and B is true, then finding that C is true makes A less credible.'In other words, finding a second explanation for an item of data makes the first explanation less credible; probabilistic causal diagnosis; and reasoning which is not supported by evidence.

•
Planning and Problem Solving.Capabilities here include: - The ability to plan a route, such as for example a flying route between cities A and B, given information about direct flights between pairs of cities including those that may be assembled into a route between A and B.

-
The ability to solve analogues of GAPs in textual form, where a 'GAP' is a Geometric Analogy Problem.
• Unsupervised Learning.Chapter 9 of [18] describes how the SPCM may achieve unsupervised learning from a body of 'raw' data, I, to create an SP-grammar, G, and an Encoding of I in terms of G, where the encoding may be referred to as E. At present the learning process has shortcomings summarised in [19] (Section 3.3) but it appears that these problems are soluble.
In its essentials, unsupervised learning in the SPCM means searching for one or more 'good' SP-grammars, where a good SP-grammar is a set of SP-patterns which is relatively effective in the economical encoding of I via SP-multiple-alignment (SPMA, Appendix C.1),

Evaluation of the ALTs and the SPTI as FDAGIs
This section evaluates the ALTs and the SPTI in terms of the criteria described in Section 3.These evaluations have been made carefully, trying to avoid biases or distortions arising from the author's association with the SPTI.

The SOM
Since no computational core is specified for the SOM, the evaluation necessarily focuses on the nature of the system when it is populated by many small agents:

•
Simplicity, Score = 0.At first sight, the SOM looks simple: intelligence is merely lots of little agents.However, a little reflection suggests that in the SOM as Minsky describes it: -There would be large numbers of agents, which collectively would be far from simple.-Since IC is not mentioned, we may assume that there would be no corresponding benefit via the simplification of those many agents.
In short, the SOM is weak in terms of Simplicity and it has been assigned a corresponding score of 0. • Power, Score = 0. Regarding the (intelligence-related) descriptive/explanatory Power of the SOM: -In the second of the quotes in Section 2.1, Minsky says: "... most of the statements in this book are speculations" [1] (p. 17).In other words, there is little evidence in support of the proposals in the book.

-
The SOM concept of human intelligence is little better than a theory that merely redescribes what it is meant to explain (Appendix C.1).An example may be seen in [1] (p. 20) where Minsky suggests that the picking up of a cup of tea by a person may be analysed into such agents as one for grasping the cup, an agent for balancing the cup to avoid spills, an agent for the thirst of the person picking up the cup, and an agent for moving the cup to the person's lips.
In short, the SOM is weak in terms of its descriptive/explanatory Power and it has been assigned a corresponding score of 0. • Other Strengths or Weaknesses, Score = −1.Apart from the weakness of the SOM in terms of Simplicity and Power, it is weak because, in terms of Popper's ideas: the theory is unfalsifiable.This is because any proposed falsification of the theory could be met by the addition, omission, or substitution of agents within the theory.The SOM is too malleable to be plausible as a theory of human intelligence.Because of this additional weakness in the SOM, it has been assigned a score of −1 for 'other strengths or weaknesses'.

Gato
Since the case for Gato depends critically on the idea that intelligence may be built up via the learning of many more-or-less simple skills, it should be evaluated in terms of how it will be when many such skills have been learned.

•
Simplicity, Score = 1.For Gato to do many things, as indicated in the quote at the beginning of Section 2.2, it must be trained in all of them (apart from the ability to learn required to learn those many skills).And training across many such tasks means adding a substantial amount of information to the core system.From the perspective of Simplicity, things do not look so good for Gato.However, since Gato can achieve quite a lot with a small computational core, it seems reasonable to give it a middling score of 1.

•
Power, Score = 2.As noted in [2] (p. 2), Gato, with appropriate training, "... can... engage in dialogue, caption images, stack blocks with a real robot arm, outperform humans at playing Atari games, navigate in simulated 3D environments, follow instructions, and more."Although the researchers' strategy is not entirely clear, it seems that the Gato research is based on the assumption in [2] that any artificial system that can learn the kinds of things that people can learn may be seen to have achieved AGI, or it will have done after it has been scaled up.This assumption has been adopted by at least one of the authors of the [2]  There are (at least) two difficulties with the assumption that AGI is merely the ability to learn the variety of skills that can be learned by people: -There is an implicit assumption that the learning of those skills is achieved via supervised learning or reinforcement learning, ("For simplicity Gato was trained offline in a purely supervised manner; however, in principle, there is no reason it could not also be trained with either offline or online reinforcement learning (RL)" [2] (p.3] but it appears that: most human learning is unsupervised, meaning that it is learning without predefined associations between, for example, words and pictures, and without rewards or punishments [18] (Chapter 9).-Gato suffers from a weakness that is similar to that in the SOM and described in Section 4.1 (item 'Power'): the belief that human intelligence is merely a collection of skills, somewhat like the discredited theory that human cognition may be understood as a collection of instincts including such implausible instincts as a putting-on-of-shoes instinct, a planting-of-seeds-instinct, a car-driving instinct, and so on.
With regard to the second point, an alternative view, adopted with varying degrees of confidence by many people working in AI, is that some capabilities are more central to the concept of intelligence than others (Section 3.2).It seems, for example, that skills like those outlined in Section 3.2 are likely to be inborn and fundamental in human intelligence, whereas others such as how to drive a car, how to play cricket, how to play the piano, and so on, are learned and not inborn, and only tangentially relevant to our concept of intelligence.Thus, in brief, with regard to the Power of Gato: the model of learning that has been adopted in Gato is unlikely to conform to the fundamentals of learning in people; and the emphasis in Gato on the variety of skills that the system may learn says little or nothing about the fundamentals of intelligence.It is possible that, like the SOM, Gato is too malleable to be plausible as a theory of human intelligence.However, since it is not impossible that human intelligence is merely a knowledge of many skills, and since Gato clearly has the potential to model many of them, the Power of Gato is hereby assigned a score of 2 (in the range 0 to 2).

•
Other strengths or weaknesses, Score = −1.Although Gato, with appropriate training, can demonstrate a variety of capabilities, we may say in a similar way that, with the installation of appropriate apps, a smartphone or any ordinary computer can demonstrate a variety of capabilities, and these may include AI capabilities.Does this make the smartphone or ordinary computer a good FDAGI?Certainly not.This is an additional reason to doubt the potential of Gato as an FDAGI.Apart from that, the reliance of Gato on a variant of the DNN model is a weakness that stems from the well-known shortcomings of DNNs, most of which are summarised in Appendix B.2.2.Hence Gato has been assigned a score of −1 under the heading 'other strengths or weaknesses'.

DALL•E 2
From the description of DALL•E 2 in Section 2.3, it is clear that in attempting to integrate two aspects of intelligence-the processing of images and the processing of natural language-it is being developed within a bottom-up strategy with its associated problems, described in Appendix E. This has a bearing on judgements both of its Simplicity and of its descriptive/explanatory Power: As with Gato, these shortcomings are why DALL•E 2 has been assigned a score of −1 for 'other strengths or weaknesses'.

Soar
Soar is one of the systems mentioned in Section 3.3 which is designed as an intelligencerelated 'core' which needs to be programmed to create one or more specific systems.Hence, it is evaluated in this and following subsections in terms of its intelligence-related core features rather than the features of any of the systems that may be derived from the Soar architecture via programming.

•
Simplicity, Score = 1.In keeping with what is said about ACT-R, and the SPTI, in Section 3.3, it seems best to evaluate the Simplicity of Soar in terms of the size of its important computational 'core'.Since that core expresses several aspects of intelligence via mechanisms for achieving those aspects of intelligence (next bullet point), and since, overall, there is little simplification via integration (as suggested by the structures shown in Figure 1), it seems reasonable to say that Soar has a middling score of 1 for Simplicity, that it is neither very complex nor very succinct.

•
Power, Score = 2.In John Laird's book, The Soar Cognitive Architecture [6], the way in which Soar expresses aspects of human intelligence is described largely via descriptions of the mechanisms within Soar which relate to each aspect.Nevertheless, the descriptive/explanatory Power of Soar with respect to each aspect of intelligence comes over reasonably clearly.For example: -Chapter 6 describes how Soar achieves 'chunking', a widely-recognised aspect of human learning which became prominent largely because of George Miller's paper about "The magical number seven, plus or minus two: ..." [26].-Chapter 7 describes how reinforcement learning may be achieved via the modification or tuning of existing rules.Although unsupervised learning is probably more fundamental, there is no doubt that reinforcement can play a part in learning.-Chapter 8 (co-authored with Yongjia Wang and Nate Derbinsky) describes Soar's semantic memory, "... a repository for long-term declarative knowledge that supplements what is contained in short-term working memory (and production memory)."[6] (p. 203).
-And so on.
Clearly, Soar embodies a definition of intelligence that is different from that described in Section 3.2.However, as noted in that section, there may be other definitions of intelligence with equal validity.Overall, it seems reasonable to say that the Power of Soar in modelling human intelligence is good, so it has been assigned a score of 2.

•
Other strengths or weaknesses, Score = 0.There seem to be no other notable strengths or weaknesses of Soar.

ACT-R
The background to ACT-R is similar to that of Soar, and the evaluation here is similar.
• Simplicity, Score = 1.In keeping with what is said about Soar, ACT-R, and the SPTI, in Section 3.3, it seems best to evaluate the Simplicity of ACT-R in terms of the size of its important computational 'core'.Since that core expresses several aspects of intelligence via mechanisms for achieving those aspects of intelligence (next bullet point), and since, overall, there is little simplification via integration (as suggested by the structures shown in Figure 2), it seems reasonable to say that the Simplicity of ACT-R is moderate, and to assign it a score of 1.

•
Power, Score = 2. ACT-R embodies a fairly detailed model of intelligence.It is different from the definition that is specified in Section 3.2 and is implicit in the structure of ACT-R (Section 2.5, second bullet point).However, as with Soar, there may be definitions of intelligence that are different from that in Section 3.2 but with equal validity.Overall, it seems reasonable to say that the Power of ACT-R in modelling human intelligence is good, and to assign it a score of 2.

•
Other strengths or weaknesses, Score = 0.There seem to be no other notable strengths or weaknesses of ACT-R.

NARS
Like DALL•E 2, NARS is being developed via a bottom-up strategy (Appendix E), and, so far, only two aspects of intelligence have been considered: Non-Axiomatic Logic and learning (Section 2.6).As will be described, this has a bearing on estimates of the Simplicity and Power of NARS.

•
Simplicity, Score = 1.Although a mature version of NARS that embraces several aspects of intelligence is likely to be relatively large, it seems best to stick with the assessment of the Simplicity of NARS as it is now.Hence, although the Simplicity of NARS is merely because it is at an early stage of a bottom-up process, the Simplicity of NARS is assessed as moderate, and it has been assigned a score of 1.

•
Power, Score = 1.Since NARS is still at an early stage in the integration of different aspects of intelligence, it seems reasonable to judge its descriptive/explanatory Power to be moderate, and to assign it a score of 1.

•
Other strengths or weaknesses, Score = 0.There seem to be no other notable strengths or weaknesses of NARS.

SPTI
• Simplicity, Score = 2.An important feature of the SPTI (including the SPCM) is that all the intelligence-related Power of the system, summarised in the next bullet point, flows from the computational 'core' of the SPCM, without the need for any kind of additional learning or programming.Viewed as a model of natural intelligence, all of the SPCM's intelligence-related 'core' may be seen as inborn capabilities, present at the system's 'birth'.
That computational core is remarkably simple: it is largely the SPMA concept (Appendix A.2) and the SPTI's procedures for unsupervised learning (Appendix A.4), much of which is the repeated application of the SPMA concept.In other words, the SPCM is largely the SPMA concept.
In short, the SPTI with the SPCM is remarkably small, meaning that its Simplicity is strong.Accordingly, it has been given the highest available score of 2.
A point that deserves emphasis here is that, in the SPTI, Simplicity may be combined with repetition of information, as described in Appendix C.3.In case this sounds like nonsense, the point is simply that the SPMA concept may be applied in several different aspects of intelligence: in hearing, in vision, in touch, and so on.That a powerful technique for compression of information may be applied in several different areas of brain function is a point that appears to have been missed in the ALTs described in this paper, and probably in other systems as well.This versatility is itself due to the way in which the SPMA concept is a generalisation of six other methods for compression of information via ICMUP [27].Thus for two related reasons-the versatility of the SPMA in modelling human intelligence and beyond; and more generally the central role for IC in the SPTI-a score of 1 has been assigned to the SPTI under the heading 'other strengths or weaknesses'.

Comparison of the SPTI with the ALTs
Table 1 summarises the evaluation scores assigned to the ALTs and the SPTI in Section 4.
Table 1.This table summarises the evaluation scores assigned to the ALTs and the SPTI in Section 4. "Other S/W" is short for "Other strengths or weaknesses".

System
Simplicity Power Other S/W Combined Score Assuming that the evaluations are fair, and care has been taken to ensure that they are, the SPTI, with a combined score of 5, stands well above the ALTs, the best of which-Soar and ACT-R-each have a combined score of 3.
The main reason for the relative strength of the SPTI is the concept of SP-multiplealignment, which is largely responsible for the versatility of the SPCM, both within AI and beyond, and for its small size.

Conclusion
This paper argues that the SP Theory of Intelligence and its realisation as the SP Computer Model is a promising foundation for the development of human-like broad AI, at the level of humans or beyond, also known as Artificial General Intelligence (AGI).
In that connection, the main intelligence-related strengths of the SPTI, and other strengths, are summarised in Appendix B. It appears that the SPTI has significant advantages over six other systems, chosen to be representative of potential foundations for the development of AGI.
The Simplicity of the SPTI, and its Power in modelling aspects of human intelligence, is almost entirely due to the powerful concept of SP-multiple-alignment (Appendix A.2), itself part of the ICMUP approach to the achievement of information compression (Appendix C.2).
As noted in the Introduction, it would be a mistake for all available research eggs to be put into one basket.Given the uncertainties in any vision of how AGI might be achieved, it would make more sense to hedge our bets with parallel streams of research.The SPTI qualifies as the basis for one of those streams of research.
Again, as noted in the Introduction, achieving AGI is likely to be far into the future, but research that is aiming for AGI is likely to produce many potential benefits and applications at points along the road to AGI.Some of those potential benefits and applications from the SPTI are described in Appendix B.2.As noted in Appendix C.1, it is intended that 'SP' should be treated as a name, like 'IBM' or 'BBC', not an abbreviation.

Appendix A. The SP Theory of Intelligence and the SP Computer Model in Brief
This appendix introduces the SP Theory of Intelligence and its realisation in the SP Computer Model with sufficient detail to ensure that the rest of the paper is intelligible.More detail may be found in the paper [19], and there is a much fuller account of the system in the book Unifying Computing and Cognition [18].
The SPTI is conceived as a brain-like system as shown in Figure A1, with New information (green) coming in via the senses (eyes and ears in the figure), and with some or all of that information compressed and stored as Old information (red), in the brain.
As described in more detail below, the processing of New information to create Old information is central in how the SPCM works and lies at the heart of the strengths of the SPTI, outlined in Appendix B.

Appendix A.1. SP-Patterns and SP-Symbols
In the SPTI, all information is represented by SP-patterns, where an SP-pattern is array of SP-symbols in one or two dimensions.An SP-symbol is simply a mark from an alphabet of alternatives that can be matched in a yes/no manner with any other SP-symbol.
Examples of SP-patterns may be seen in Figure A2, as described in the caption to the figure.
At present, the SPCM works only with one-dimensional SP-patterns but it is envisaged that, at some stage, the SPCM will be generalised to work with two-dimensional SPpatterns as well as one-dimensional SP-patterns.This should open up the system for the representation and processing of diagrams and pictures, and, as described in [28] (Sections 6.1 and 6.2), structures in three dimensions.

Appendix A.2. The SP-Multiple-Alignment Concept
The concept of SP-multiple-alignment (SPMA) is described in outline here.
The SPMA concept is largely responsible for the strengths of the SPTI as summarised in Appendix B. Apart from some additional programming in the procedures for unsupervised learning (Appendix A.4), the SPMA concept is the means by which the SPCM achieves IC.
The SPMA concept in the SPCM has been borrowed and adapted from the concept of 'multiple sequence alignment' in bioinformatics [19] (Section 4).An example of an SPMA is shown in Figure A2.
t h e p l u m s a r e r i p e 0

Figure A2.
The best SPMA created by the SPCM that achieves the effect of parsing a sentence ('t h e p l u m s a r e r i p e') into its parts and sub-parts, as described in the text.The sentence in row 0 ia a New SP-pattern, while each of the rows 1 to 9 contains a single Old SP-pattern, drawn from a repository of Old SP-patterns.
Here is a summary of how SP-multiple-alignments like the one shown in Figure A2  are formed: 1.
At the beginning of processing, the SPCM has a store of Old SP-patterns including those shown in rows 1 to 9 (one SP-pattern per row), and many others.When the SPCM is more fully developed, those Old SP-patterns would have been learned from raw data as outlined in Appendix A.4, but for now they are supplied to the program by the user.

2.
The next step is to read in the New SP-pattern, 't h e p l u m s a r e r i p e'.

3.
Then the program searches for 'good' matches between SP-patterns, where 'good' matches are ones that yield relatively high levels of compression of the New SP-pattern in terms of Old SP-patterns with which it has been unified.

4.
As can be seen in the figure, matches are identified at early stages between (parts of) the New SP-pattern and (parts of) the Old SP-patterns 'D 17 t h e #D', 'Nrt 6 p l u m #Nrt', 'V Vpl 11 a r e #V', and 'A 21 r i p e #A'.

5.
In SPMAs, IC is achieved by the merging or unification of SP-patterns, or parts of SP-patterns, that are the same, like the match between 't h e' in the New SP-pattern and the same three letters in the Old SP-pattern 'D 17 t h e #D'.

6.
The unification of 't h e' with 'D 17 t h e #D' yields the unified SP-pattern 'D 17 t h e #D', with exactly the same sequence of SP-symbols as the second of the two SP-patterns from which it was derived.7.
The details of how IC for any one SPMA is calculated are given in [19] (Section 4.1) and [18] (Section 3.5).

8.
As processing proceeds, similar pair-wise matches and unifications eventually lead to the creation of SP-multiple-alignments like that shown in Figure A2.At every stage, all the SP-multiple-alignments that have been created are evaluated in terms of IC, and then the best SP-multiple-alignments are retained and the remainder are discarded.In this case, the overall 'winner' is the SPMA shown in Figure A2.9.
This process of searching for good SP-multiple-alignments in stages, with selection of good partial solutions at each stage, is an example of heuristic search.This kind of search is necessary because there are too many possibilities for anything useful to be achieved by exhaustive search.By contrast, heuristic search can normally deliver results that are reasonably good within a reasonable time, but it cannot guarantee that the best possible solution has been found.
Appendix A.9. Unfinished Business Like most theories, the SP theory has shortcomings, but it appears that they may be overcome.At present, the most immediate problems are:

•
Processing of Information in Two or More Dimensions.No attempt has yet been made to generalise the SP model to work with patterns in two dimensions, although that appears to be feasible to do, as outlined in [18] (Section 13.2.1).As noted in [18] (Section 13.2.2), it is possible that information with dimensions higher than two may be encoded in terms of patterns in one or two dimensions, somewhat in the manner of architects' drawings.A 3D structure may be stitched together from several partiallyoverlapping 2D views, in much the same way that, in digital photography, a panoramic view may be created from partially-overlapping pictures [28] (Sections 6.1 and 6.2).

•
Recognition of Perceptual Features in Speech and Visual Images.For the SP system to be effective in the processing of speech or visual images, it seems likely that some kind of preliminary processing will be required to identify low level perceptual features such as, in the case of speech, phonemes, formant ratios, or formant transitions, or, in the case of visual images, edges, angles, colours, luminances, or textures.In vision, at least, it seems likely that the SP framework itself will prove relevant since edges may be seen as zones of non-redundant information between uniform areas containing more redundancy and, likewise, angles may be seen to provide significant information where straight edges, with more redundancy, come together [28] (Section 3).As a stop-gap solution, the preliminary processing may be done using existing techniques for the identification of low-level perceptual features [34] (Chapter 13).• Unsupervised Learning.A limitation of the SP computer model as it is now is that it cannot learn intermediate levels of abstraction in grammars (e.g., phrases and clauses), and it cannot learn the kinds of discontinuous dependencies in natural language syntax that are described in [19] (Sections 8.1 and 8.2).I believe these problems are soluble and that solving them will greatly enhance the capabilities of the system for the unsupervised learning of structure in data [19] (Section 5.1).

•
Processing of Numbers.The SP model works with atomic symbols such as ASCII characters or strings of characters with no intrinsic meaning.In itself, the SP system does not recognise the arithmetic meaning of numbers such as '37' or '652' and will not process them correctly.However, the system has the potential to handle mathematical concepts if it is supplied with patterns representing Peano's axioms or similar information [18] (Chapter 10).As a stop-gap solution, existing technologies may provide whatever arithmetic processing may be required.
In the process of solving these and other problems in the development of the SPTI, it seems likely that the proposed SP Machine (Appendix A.10, next) will be a useful vehicle for the representation and testing of ideas.

Appendix A.10. Future Developments and the SP Machine
In view of the potential of the SPTI in diverse areas (Appendix B), the SPCM appears to hold promise as the foundation for the development of an SP Machine, described in [35], and illustrated schematically in Figure A4.
It is envisaged that the SP Machine will feature high levels of parallel processing and a good user interface.It may serve as a vehicle for further development of the SPTI by researchers anywhere.Eventually, it should become a system with industrial strength that may be applied to the solution of many problems in science, government, commerce, industry, and in non-profit endeavours.
It is envisaged that the best way forward is to develop the SP Machine by porting the SPCM onto a platform which will provide for the application of high levels of parallel processing, and to adapt the CPCM to exploit those high levels of parallel processing.Additionally, there is a need to give the system a good 'friendly' user interface.
Although it is likely that a mature version of the SP Machine will be very much more efficient than the extraordinarily power-hungry and data-hungry DNNs [25] (Section 9), high levels of parallel processing are likely to be needed for relatively demanding operations such as unsupervised learning, especially with 'big data' and the like.It is envisaged that the SP Machine will be entirely open so that researchers anywhere may test the system and develop it, perhaps following the suggestions in [35].To make things easy for other researchers, the SP Machine may be hosted on one or more of the following platforms:

•
A workstation with GPUs providing high levels of parallel processing.Other researchers would need to buy one or more such workstations, and then, on each machine, they may install the open-source software of the SPCM, ready for further development.

•
Facilities in the cloud that provide for high levels of parallel processing.

•
Since pattern-matching processes in the foundations of the SPCM are similar to the kinds of pattern matching that are fundamental in any good search engine, an interesting possibility is to create the SP Machine as an adjunct to one or more search engines.This would mean that, with search engines that are not open access, permission would be needed to access functions in relevant parts of the search engine, so that those functions may be used within the SP Machine.

Appendix B.1.1. Intelligence-Related Strengths Excluding Reasoning
Most of the aspects of intelligence described here have been demonstrated with the SPCM.In cases where there is merely potential and not actual demonstrations, this is indicated in the text.

•
Compression and Decompression of Information.In view of substantial evidence for the importance of IC in HLPC [20], IC should be seen as an important feature of human intelligence.Paradoxical as this may seem, the SPCM provides for decompression of information via the compression of information (Appendix C.7). • Natural Language Processing.Under the general heading of "Natural Language Processing" are capabilities that facilitate the learning and use of natural languages.These include: -The ability to structure syntactic and semantic knowledge into hierarchies of classes and sub-classes, and into parts and sub-parts.

-
The ability to integrate syntactic and semantic knowledge.

-
The ability to encode discontinuous dependencies in syntax such as the number dependency (singular or plural) between the subject of a sentence and its main verb, or gender dependencies (masculine or feminine) in French-where 'discontinuous' means that the dependencies can jump over arbitrarily large intervening structures.Also important in this connection is that different kinds of dependency (e.g., number and gender) can co-exist without interfering with each other.

-
The ability to accommodate recursive structures in syntax.

-
The production of natural language.A point of interest here is that the SPCM provides for the production of language as well as the analysis of language, and it uses exactly the same processes for IC in the two cases-in the same way that the SPCM uses exactly the same processes for both the compression and decompression of information (Appendix C.7).
• Recognition and Retrieval.Capabilities that facilitate recognition of entities or retrieval of information include: -The ability to recognise something or retrieve information on the strength of a good partial match between features as well as an exact match.

-
Recognition or retrieval within a class-inclusion hierarchy with 'inheritance' of attributes, and recognition or retrieval within an hierarchy of parts and sub-parts.-'Semantic' kinds of information retrieval-retrieving information via 'meanings'.

•
Planning and Problem Solving.Capabilities here include: -The ability to plan a route, such as for example a flying route between cities A and B, given information about direct flights between pairs of cities including those that may be assembled into a route between A and B.

-
The ability to solve geometric analogy problems, or analogues in textual form.
• Unsupervised Learning.Chapter 9 of [18] describes how the SPCM may achieve unsupervised learning from a body of 'raw' data, I, to create an SP-grammar, G, and an Encoding of I in terms of G, where the encoding may be referred to as E. At present the learning process has shortcomings summarised in [19] (Section 3.3) but it appears that these problems may be overcome.
In its essentials, unsupervised learning in the SPCM means searching for one or more 'good' SP-grammars, where a good SP-grammar is a set of SP-patterns which is relatively effective in the economical encoding of I via SP-multiple-alignment (Appendix C.1).This kind of learning includes the discovery of segmental structures in data (including hierarchies of segments and subsegments) and the learning classes (including hierarchies of classes and subclasses).• Abductive Reasoning.Abductive reasoning is more obviously probabilistic than deductive reasoning: "One morning you enter the kitchen to find a plate and cup on the table, with breadcrumbs and a pat of butter on it, and surrounded by a jar of jam, a pack of sugar, and an empty carton of milk.You conclude that one of your house-mates got up at night to make him-or herself a midnight snack and was too tired to clear the table.This, you think, best explains the scene you are facing.To be sure, it might be that someone burgled the house and took the time to have a bite while on the job, or a house-mate might have arranged the things on the table without having a midnight snack but just to make you believe that someone had a midnight snack.But these hypotheses strike you as providing much more contrived explanations of the data than the one you infer to." [36].
• Probabilistic Networks and Trees.One of the simplest kinds of system that supports reasoning in more than one step (as well as single step reasoning) is a 'decision network' or a 'decision tree'.In such a system, a path is traced through the network or tree from a start node to two or more alternative destination nodes depending on the answers to multiple-choice questions at intermediate nodes.Any such network or tree may be given a probabilistic dimension by attaching a value for probability or frequency to each of the alternative answers to questions at the intermediate nodes.

•
Reasoning With 'Rules'.SP-patterns may serve very well within the SPCM for the expression of such probabilistic regularities as 'sunshine with broken glass may create fire', 'matches create fire', and the like.Alongside other information, rules like those may help determine one or more of the more likely scenarios leading to the burning down of a building, or a forest fire.• Nonmonotonic Reasoning.The conclusion that "Socrates is mortal", deduced from "All humans are mortal" and "Socrates is human" remains true for all time, regardless of anything we learn later.By contrast, the inference that "Tweety can probably fly" from the propositions that "Most birds fly" and "Tweety is a bird" is nonmonotonic because it may be changed if, for example, we learn that Tweety is a penguin.• 'Explaining Away'.This means "If A implies B, C implies B, and B is true, then finding that C is true makes A less credible."In other words, finding a second explanation for an item of data makes the first explanation less credible.
There is also potential in the system for: • Spatial Reasoning.The potential is described in [37] (Section IV-F.1).

•
What-If Reasoning.The potential is described in [37] (Section IV-F.2).Although SP-patterns are not very expressive in themselves, they come to life in the SPMA framework within the SPCM.Within the SPMA framework, they provide relevant knowledge for each aspect of intelligence mentioned in Appendix B.1.
As previously noted (Appendix A), the addition of two-dimensional SP patterns to the SPCM is likely to expand the capabilities of the SPTI to the representation and processing of structures in two-dimensions and three-dimensions, and the representation of procedural knowledge with parallel processing.An important additional feature of the SPCM, alongside its versatility in aspects of intelligence and diverse forms of reasoning, and its versatility in the representation and processing of diverse kinds of knowledge, is that there is clear potential for the SPCM to provide for the seamless integration of diverse aspects of intelligence and diverse forms of knowledge, in any combination.This is because those several aspects of intelligence and several kinds of knowledge all flow from a single coherent and relatively simple source: the SPMA framework.
It appears that this kind of seamless integration is essential in any artificial system that aspires to AGI.
Figure A5 shows schematically how the SPTI, with SPMA at centre stage, exhibits versatility and seamless integration.Strong support for the SPTI has arisen, indirectly, from the book Architects of Intelligence by science writer Martin Ford [29].To prepare for the book, he interviewed several influential experts in AI to hear their views about AI research, including opportunities and problems in the field: "The purpose of this book is to illuminate the field of artificial intelligence-as well as the opportunities and risks associated with it-by having a series of deep, wide-ranging conversations with some of the world's most prominent AI research scientists and entrepreneurs."Martin Ford [29] (p.2).

SP
In the book, Ford reports what the AI experts say, giving them the opportunity to correct errors he may have made so that the text is a reliable description of their thinking.
This source of information has proved to be very useful in defining problems in AI research that influential experts in AI deem to be significant.This has been important from the SP perspective because, with 17 of those problems and three others-20 in all-there is clear potential for the SPTI to provide a solution.
Since these are problems with broad significance, not micro-problems of little consequence, the clear potential of the SPTI to solve them is a major result from the SP programme of research, demonstrating some of the power of the SPTI.
The paper [25] describes those 20 significant and how the SPTI may solve them.The following summary describes each of the problems briefly.Readers are invited to read [25] to see how the SPTI may solve them: 1.
The Symbolic Versus Sub-Symbolic Divide.The need to bridge the divide between symbolic and sub-symbolic kinds of knowledge and processing [25] (Section 3).

2.
Errors in Recognition.The tendency of DNNs to make large and unexpected errors in recognition [25] (Section 3).

3.
Natural Languages.The need to strengthen the representation and processing of natural languages, including the understanding of natural languages and the production of natural language from meanings [25] (Section 5). 4.
Unsupervised Learning.Overcoming the challenges of unsupervised learning.Although DNNs can be used in unsupervised mode, they seem to lend themselves best to the supervised learning of tagged examples [25] (Section 6).It is clear that most human learning, including the learning of our first language or languages [32], is achieved via unsupervised learning, without needing tagged examples, or reinforcement learning, or a 'teacher', or other form of assistance in learning (cf.[45]).Incidentally, a working hypothesis in the SP programme of research is that unsupervised learning can be the foundation for all other forms of learning, including learning by imitation, learning by being told, learning with rewards and punishments, and so on.

5.
Generalisation.The need for a coherent account of generalisation, under-generalisation (over-fitting), and over-generalisation (under-fitting).Although this is not mentioned in Ford's book [29], there is the related problem of reducing or eliminating the corrupting effect of errors in the data which is the basis of learning [25] (Section 7).6.
One-Shot Learning.Unlike people, DNNs are ill-suited to the learning of usable knowledge from one exposure or experience [25] (Section 8). 7.
Transfer Learning.Although transfer learning-incorporating old learning in newer learning-can be done to some extent with DNNs [46] (Section 2.1), DNNs fail to capture the fundamental importance of transfer learning for people, or the central importance of transfer learning in the SPCM [25] (Section 9).8.
Reducing Computational Demands.How to increase the speed of learning in AI systems, and how to reduce the demands of AI learning for large volumes of data, and for large computational resources [25] (Section 10).9.
Transparency.Although transfer learning-incorporating old learning in newer learningcan be done to some extent with DNNs [46] (Section 2.1), DNNs fail to capture the fundamental importance of transfer learning for people, or the central importance of transfer learning in the SPCM [25] (Section 9).10.Probabilistic Reasoning.How to achieve probabilistic reasoning that integrates with other aspects of intelligence [25] (Section 12).11.Commonsense.The challenges of commonsense reasoning and commonsense knowledge [25] (Section 13).12. Top-Down Strategies.The need to re-balance research towards top-down strategies [25] (Section 14).13.Self-Driving Vehicles.How to minimise the risk of accidents with self-driving vehicles [25] (Section 15).14.Compositionality.By contrast with people, and the SPTI, DNNs are not well suited to the learning and representation of such compositional structures as part-whole hierarchies and class-inclusion hierarchies [25] (Section 16).15.Commonsense Reasoning and Commonsense Knowledge.The challenges of commonsense reasoning and commonsense knowledge [25] (Section 17).16.Information Compression.Establishing the key importance of IC in AI research [25] (Section 18).There is good evidence that much of HLPC may be understood as IC, and for that reason, IC is fundamental in the SPTI, including the SPCM (Appendix A, Appendix C.4).By contrast, IC receives no mention in [2], and does not receive much emphasis in Schmidhuber's review of DNNs (see, for example, [47] (e.g., Sections 4.4, 5.6.3 and 6.7)).17.A Biological Perspective.Establishing the importance of a biological perspective in AI research [25] (Section 19).18. Distributed Versus Localist Knowledge.Establishing whether or not knowledge in the brain is represented in 'distributed' or 'localist' form [25] (Section 20).19.Adaptation.How to bypass the limited scope for adaptation in DNNs [25] (Section 21).20.Catastrophic Forgetting.Catastrophic forgetting is the way in which, when a given DNN has learned one thing and then it learns something else, the new learning wipes out the earlier learning.This problem is quite different from human learning, where new learning normally builds on earlier learning, although of course we all have a tendency to forget some things.However, one may make a copy of a DNN that has already learned something, and then train it on some new concept that is related to what has already been learned.Sustainability.The SPTI has potential for substantial reductions in the very large demands for energy of standard DNNs, and applications that need to manage huge quantities of data such as those produced by the Square Kilometre Array [49].Where those demands are met by the burning of fossil fuels, there would be corresponding reductions in the emissions of CO 2 .• Transparency in Computing.By contrast with applications with DNNs, the SPTI provides a very full and detailed audit trail of all its processing, and all its knowledge is transparent and open to inspection.Additionally, there are reasons to believe that, when the system is more fully developed, its knowledge will normally be structured in forms that are familiar such as class-inclusion hierarchies, part-whole hierarchies, run-length coding, and more.Strengths of the SPTI in these area are described in [50].

Appendix C. Information Compression in Biology and the SPTI
As its title suggests, this appendix considers the role of IC in biology, especially in HLPC, and in the SPTI.

Appendix C.1. Information compression, Simplicity and Power
In words attributed to the English Franciscan friar William of Ockham: "Entities should not be multiplied beyond necessity".This principle, known as 'Ockham's razor', is commonly understood to mean that a good theory should be simple but not so simple that it says little or nothing that is useful.

•
Any good theory may be seen as the product of a process that aims to simplify and integrate observations and concepts across a broad canvass (Appendix E), and this means applying IC to those observations and concepts.

•
In all cases, IC may be seen as a process that increases the Simplicity of a body of information, I, by reducing or eliminating redundancy in I, whilst retaining as much as possible of the non-redundant descriptive and explanatory Power of I.

•
For any one theory, it may be difficult or impossible to obtain precise values for Simplicity and Power.In cases like that, it may be necessary to use informal estimates.

•
Since, for any one theory, the range of observations and concepts in I is likely to vary amongst alternative theories in the given area of interest, two or more theories of that area may be compared via some kind of combination of Simplicity, Power, and other strengths or weaknesses.In the example described in Section 3.1, simple measures of those attributes are simply added together.
• Care should be taken to ensure that the estimates of Simplicity, Power, and other strengths or weaknesses, are derived from a broad base of evidence (what Allen Newell called "a genuine slab of human behaviour," Appendix E), not some trivial corner of the given area of interest.• Within this framework, two particularly weak kinds of theory may be recognised: -Any theory that is so general that, superficially, it can describe or explain anything (e.g., 'Because God wills it') should be rejected.In terms of Simplicity and Power, any such theory is weak because it is too simple and correspondingly lacking in Power.

-
Any theory that merely redescribes observations without any compression is a weak theory that should be rejected.In terms of Simplicity and Power, such a theory is weak because, without compression, the Simplicity of the theory is poor.
In this paper, a favourable combination of Simplicity and Power, or the potential for such a favourable combination, is what is mainly required in a system for it to qualify as an FDAGI.Of course, when the aim is to achieve AGI, Power must be the power of the system to described aspects of human intelligence.
Simplicity and Power are the reason for the name 'SP'.However, as with such names as 'IBM' or 'BBC', it is intended that 'SP' should be used as a name and not as an abbreviation for Simplicity and Power.
Appendix C.2.The Working Hypothesis That IC May Always Be Achieved Via the Matching and Unification of Patterns A working hypothesis in the SP research is that all kinds of IC may be achieved via ICMUP.Although this is a 'working' hypothesis, there is much supporting evidence: the powerful concept of SPMA may be understood as an example of ICMUP [27]; the SPMA concept seems to underpin several aspects of intelligence (Appendix B.1), including several kinds of probabilistic reasoning; and much of mathematics, perhaps all of it, may be understood in terms of ICMUP (Appendix D and [51]).
In this research, seven main variants of ICMUP are recognised [51] (Sections 5.1-5.7): • Basic ICMUP.Two or more instances of any pattern may be merged or 'unified' to make one instance [51] (Section 5.1).

•
Chunking-With-Codes. Any pattern produced by the unification of two or more instances is termed a 'chunk'.A 'code' is a relatively short identifier for a unified chunk which may be used to represent the unified pattern in each of the locations of the original patterns [51] (Section 5.2).• Schema-Plus-Correction.A 'schema' is a chunk that contains one or more 'corrections' to the schema.For example, a menu in a restaurant may be seen as a schema that may be 'corrected' by a choice of starter, a choice of main course, and a choice of pudding [51] (Section 5.3).

•
Run-Length Coding.In run-length coding, a pattern that repeats two or more times in a sequence may be reduced to a single instance with some indication that it repeats, or perhaps with some indication of when it stops, or even more precisely, with the number of times that it repeats [51] (Section 5.4).

•
Class-Inclusion Hierarchies.Each class in a hierarchy of classes represents a group of entities that have the same attributes.Each level in the hierarchy inherits all the attributes from all the classes, if any, that are above it [51] (Section 5.5).

•
Part-Whole Hierarchies.A part-whole hierarchy is similar to a class-inclusion hierarchy but it is a hierarchy of part-whole groupings [51] (Section 5.6).• SP-multiple-alignment.The SPMA concept is described in Appendix A.2 and in [51] (Section 5.7).The SPMA concept may be seen as a generalisation of the other six variants of ICMUP, as demonstrated via the SPCM in [27].
This list probably does not exhaust the possible variants of ICMUP, but they are the ones that have received most attention so far in the SP programme of research.go down through time.Thus, far from providing the rungs of a ladder by which psychology gradually climbs to clarity, this form of conceptual structure leads rather to an ever increasing pile of issues, which we weary of or become diverted from, but never really settle."[59] (pp.2-7).
In the light of what Newell says, the reason that this kind of bottom-up strategy seems always to fail is that a theory that works in one local area rarely generalises to any other local area, or to any high-level view.Thus a persistent focus on low-level observations and concepts, with little or no attention to high-level concepts, makes it difficult or impossible to achieve simplification and integration at high levels of abstraction.

Appendix E.3. The Adoption of a Top-Down Research Strategy in the SP Research
The overarching goal of the SP research is to simplify and integrate observations and concepts in AI, mainstream computing, mathematics, and human learning, perception, and cognition.
In the quest for a general theory of those observations and concepts, the SPTI has been developed via a top-down, breadth-first research strategy with exceptionally wide scope.A clue was provided by the bioinformatics concept of 'multiple sequence alignment' which seemed to have the potential for the desired simplification and integration of concepts across a wide area.
As mentioned in Appendix A.2, the concept of multiple sequence alignment led to the development of the concept of SP-multiple-alignment.Despite its similarity with the concept of multiple sequence alignment, major programme of work was needed to develop the SP-multiple-alignment concept, including the creation and testing of hundreds of versions of the SPCM, to develop the new concept and to explore its range of potential applications.
The SP strategy should help to meet the concerns of Gary Marcus and Ernest Davis: "What's missing from AI today-and likely to stay missing, until and unless the field takes a fresh approach-is broad (or "general") intelligence."[61] (p.15).

2
Soar shares many characteristics with other cognitive architectures (Kotseruba & Tsotsos, 2020).The similarities shared by Soar, ACT-R (Anderson et al., 2004), and Sigma (Rosenbloom et al., 2016) led to the development of the Common Model of Cognition

Figure 1 :Figure 1 .
Figure 1: Structure of Soar memories, processing modules, learning modules and their connections.

Figure A4 .
Figure A4.Schematic representation of the development and application of the SP Machine.Reproduced from Figure 2 in [19], with permission.

Appendix B. 1 . 2 .-
Probabilistic Reasoning Capabilities here include: • One-Step 'Deductive' Reasoning.A simple example of modus ponens syllogistic reasoning goes like this: If something is a bird then it can fly.-Tweety is a bird.-Therefore, Tweety can fly.

Appendix B. 1 . 3 .
The Representation and Processing of Several Kinds of Intelligence-Related Knowledge

Appendix B. 1 . 4 .
The Seamless Integration of Diverse Aspects of Intelligence, and Diverse Kinds of Knowledge, in Any Combination being developed via a bottom-up strategy, and because it is at a relatively early stage in that process, it may be seen to be relatively strong in terms of Simplicity.Although a mature version of DALL•E 2 that embraces several aspects of intelligence is likely to be relatively large, it seems best to stick with the assessment of the Simplicity of DALL•E 2 as it is now.Hence, the Simplicity of DALL•E 2 is hereby assigned a language", see Section 2.3.), other parts seem to be more focussed on meeting the needs of potential users of the system (e.g., "Safety mitigations we have already developed include: Preventing Harmful Generations ... Curbing Misuse ... Phased Deployment Based on Learning ..."), see Section 2.3.More generally, the project is not tightly focussed on modelling aspects of human intelligence.However, it would be perverse to give DALL•E 2 a score of 0 for Power because the kinds of things it can do are undoubtedly impressive.Hence it has seemed most appropriate to give it a Power score of 2, on the scale 0 to 2.
[25]this reason, and because DALL•E 2 is still at an early stage of that bottom-up process, its Power with respect to the development of AGI may be judged to be weak.-Whilesomeparts of the project are relevant to the development of AGI (e.g., The creation of "a new AI system that can create realistic images and art from a description in natural • Other strengths or weaknesses, Score = −1.DALL•E 2 is a 'transformer' model[24], and a transformer model is a kind of DNN (Section 2.2), and DNNs have well known shortcomings compared with people and the SPTI, most of which are summarised in Appendix B.2.2.There is more detail about the shortcomings of DNNs in[25].
• Power, Score = 2.The intelligence-related capabilities of the SPTI (including the SPCM), which are substantial, are described in: Appendices B.1 and B.2.1, with indications of a few exceptions not yet demonstrated in the SPCM; and Appendix B.2.2.In short, the SPTI is strong in its Power to model aspects of human intelligence, so it has accordingly been assigned the highest available score of 2.
[20]her Strengths or Weaknesses, Score = 1.Largely because of substantial evidence for the importance of IC in HLPC[20], IC is central in the structure and workings of the SPCM (Appendix C.5).In addition, a major discovery in the SP programme of research is the powerful concept of SP-multiple-alignment (Appendix A.2).This is largely responsible for the versatility of the SPCM across several aspects of intelligence, summarised above, and for the potential of the SPCM in other areas (Appendices B.2.3 and D).
[37]Development of Intelligence in Autonomous Robots.The SPTI opens up a radically new approach to the development of intelligence in autonomous robots[37].• Commonsense Reasoning and Commonsense Knowledge.Largely because of research by Ernest Davis and Gary Marcus (see, for example, [41]), the challenges in this area of AI research are now better known.Preliminary work shows that the SPTI has promise in this area [42].Both Artificial and Natural.The SPTI opens up a new approach to the development of computer vision and its integration with other aspects of intelligence, and it throws light on several aspects of natural vision: [28,44].Appendix B.2.2.The Clear Potential of the SPTI to Solve 20 Significant Problems in AI Research • [48]prior knowledge may help in the learning of the new concept.Additionally, one may provide a very large DNN, divided into sections, and train each section on a different concept[46].Appendix B.2.3.Other Potential Benefits and Applications of the SPTI, with Less Relevance to IntelligenceThis section describes other potential benefits and applications of the SPTI that are less closely related to AI.They include:•Overview of Potential Benefits and Applications.As mentioned above, several potential areas of application of the SPTI are described in[39].The ones that are less directly relevant to AI include: the simplification and integration of computing systems; software engineering; the representation of knowledge IC; bioinformatics; the detection of computer viruses; and data fusion.•BigData.The SPTI has potential in helping to solve several problem with big data[48].These include: overcoming the problem of variety in big data; the unsupervised learning or discovery of 'natural' structures in data; the interpretation of data; the analysis of streaming data; making big data smaller; economies in the transmission of data; managing errors and uncertainties in data; visualisation of knowledge structures.•