How the SP System May Promote Sustainability in Energy Consumption in IT Systems

: The SP System (SPS), referring to the SP Theory of Intelligence and its realisation as the SP Computer Model , has the potential to reduce demands for energy from IT, especially in AI applications and in the processing of big data, in addition to reductions in CO 2 emissions when the energy comes from the burning of fossil fuels. The biological foundations of the SPS suggest that with further development, the SPS may approach the extraordinarily low (20 W)energy demands of the human brain. Some of these savings may arise in the SPS because, like people, the SPS may learn usable knowledge from a single exposure or experience. As a comparison, deep neural networks (DNNs) need many repetitions, with much consumption of energy, for the learning of one concept. Another potential saving with the SPS is that like people, it can incorporate old learning in new. This contrasts with DNNs where new learning wipes out old learning (‘catastrophic forgetting’). Other ways in which the mature SPS is likely to prove relatively parsimonious in its demands for energy arise from the central role of information compression (IC) in the organisation and workings of the system: by making data smaller, there is less to process; because the efﬁciency of searching for matches between patterns can be improved by exploiting probabilities that arise from the intimate connection between IC and probabilities; and because, with SPS-derived ’Model-Based Codings’ of data, there can be substantial reductions in the demand for energy in transmitting data from one place to another.


Introduction
This paper describes the potential of the SP System (SPS), referring to the SP Theory of Intelligence and its realisation in the SP Computer Model (SPCM), for reductions in demands for energy, in comparison with current AI and IT systems.When the energy comes from the burning of fossil fuels, there is potential for corresponding reductions in emissions of CO 2 .The origin of the name 'SP' is described in Section 1.1.2,below.
The SPS is introduced in Section 2, with more detail in Appendix A. Other sections describe aspects of the system with potential for substantial reductions in the energy required for AI and other IT processing.
There is some overlap of content with a previously published paper about big data [1], but the focus of this paper is on sustainability, not big data, and the audiences for the two papers are likely to be largely different.

Chasms to Be Bridged
At least two features of AI and other IT systems today raise concerns about energy consumption.

Energy Demands of Deep Neural Networks
The dominant paradigm in AI research today is 'deep neural networks' (DNNs), largely because of some impressive results that have been achieved with them.These include:

•
Beating the world's best human players at the game of Go (See, for example: AlphaGo, https://tinyurl.com/yaxg9vz4;Mastering the game of Go with deep neural networks and tree search, https://tinyurl.com/12ge3oht,all retrieved on 1 March 2021).

•
Solving problems in the folding of proteins that are widely recognised to be very difficult.See, for example: AlphaFold: Using AI for scientific discovery, https:// tinyurl.com/1bnwivkp,AI protein-folding algorithms solve structures faster than ever, https://tinyurl.com/3bt77zo4,all retrieved on 1 March 2021.
However, it is widely recognised that results like these have been achieved at a heavy cost in terms of energy and at a heavy cost in terms of CO 2 emissions when the energy comes from the burning of fossil fuels.For example, it has been discovered by Emma Strubell and colleagues [4] that the process of training a large AI model can emit more than 626,000 pounds of carbon dioxide, which is equivalent to nearly five times the lifetime emissions of the average American car, including the manufacture of the car itself.
This extraordinary figure for emissions of CO 2 from DNNs, and correspondingly large demands for energy, contrasts sharply with the way people can do similar things relatively fast and with a brain that runs on only about 20 watts of power [4][5][6].
In terms of sustainability, this chasm between the energy demands of DNNs and of the human brain represents a huge challenge, a challenge which is in urgent need of an answer in the now widely adopted quest to cut worldwide emissions of fossil carbon to zero by 2050, preferably sooner.Even if the energy to run AI systems all comes from renewable sources, we should not be unnecessarily wasteful.
This paper describes reasons for believing that the SPS has the potential to bridge this chasm and meet the challenge just described.

Communications and Big Data
In their book Smart Machines [7], John Kelly and Steve Hamm write: "The Square Kilometre Array is one of the most ambitious scientific projects ever undertaken.Its organizers plan on setting up a massive radio telescope made up of more than half a million antennas spread out across vast swaths of Australia and South Africa."John E. Kelly III and Steve Hamm ( [7], p. 62).
"The SKA is the ultimate big data challenge.... The telescope will collect a veritable deluge of radio signals from outer space-amounting to fourteen exabytes of digital data per day ..." (ibid., p. 63).
These enormous quantities of data from theSKA will create huge problems in the management of data for even the smartest or most powerful of smart machines.One of the most surprising of those problems is that the amount of energy required merely to move even a small part of these data from one place to another is proving to be a significant headache for the SKA project ( [7], p. 65, p. 92) and other projects of that kind.(However, see Sections 7.4 and 7.4.7).
More generally: It is intended that 'SP' should be treated as a name, without any need to expand the letters in the name or explain the origin of the letters, as with names such as 'IBM' or 'BBC'.This because:

•
The SPS is intended, in itself, to combine Simplicity with descriptive and explanatory Power.

•
In addition, because the SPS works entirely by the compression of information, and this may be seen as a process that creates structures that combine conceptual Simplicity with descriptive and explanatory Power.

Introduction to the SPS
The SPS is the product of a lengthy programme of research, seeking to simplify and integrate observations and concepts across AI, mainstream computing, mathematics, and human learning, perception, and cognition.
An important unifying principle in the SPS is that all kinds of processing are achieved via IC, and IC is central in how knowledge in the system is organised.The main reason for this strong focus on IC is extensive evidence for the importance of IC in human learning, perception, and cognition [9].Another reason which has emerged with the development and testing of the SPCM is that IC as it is incorporated in the SPCM provides for the modelling of diverse aspects of intelligence.
An important discovery from this research is the concept of SP-multiple-alignment (SPMA), a construct within the SPCM (Appendix A.3) which is largely responsible for: the versatility of the SPCM across aspects of AI, including diverse forms of reasoning (Appendices A.5.1 and A.5.2); the versatility of the SPCM in the representation of diverse forms of knowledge (Appendix A.5.3); and the seamless integration of diverse aspects of AI, and diverse forms of knowledge, in any combination (Appendix A.5.4).
In addition, it appears that the SPMA construct is largely responsible for the potential of the SPS for several benefits and applications (Appendix A.6).
Appendix A describes the SPS in outline, including descriptions of how the system achieves IC: via the matching and unification of patterns (Appendix A.2), via the building of SPMAs (Appendix A.3), and via the creation of 'SP-grammars' (Appendix A.4).

Biological Foundations
In connection with the potential in the SPS for reductions in AI-related demands for energy, it is relevant to say that in several dimensions, the system has foundations in biology, neurology, and psychology ("biological foundations" for short).As the SPS becomes progressively more mature, and especially in the development of SP-Neural (Appendix A.7), the biological foundations may be helpful in bringing energy demands of the SPS down towards the extraordinarily low 20 watts of the human brain.
Biological foundations of the SPS may be seen in the following areas: • Information compression.The SP programme of research has, from the beginning, been tightly focussed on the importance of IC in the workings of brains and nervous systems, and how they organise knowledge (Section 7, [9]).Hence, IC is central in the workings of the SPS.The idea that IC might be significant in human perception and cognition was pioneered by Fred Attneave [10,11], Horace Barlow [12,13], and others, and has been a subject of research ever since.

•
Natural selection.In human biology, and in the biology of non-human animals, it seems likely that IC would play a prominent role in natural selection because: (1) IC can speed up the transmission of a given body of information, I, in a given bandwidth; or it requires less bandwidth to transmit I at a given speed.(2) Likewise, IC can reduce the storage space required for a given body of information, I; or it can increase the amount of information that can be accommodated in a given store.

•
Research in language learning.The SP programme of research is founded on earlier research developing computer models of language learning [14] and incorporates many insights from that research, especially the importance of IC in language learning.

•
Cell assembly and pattern assembly.Although unsupervised learning in the SPS is entirely different from 'Hebbian' learning (See Appendix A.4), Donald Hebb's [15] concept of a 'cell assembly' is quite similar to the SP-Neural concept of a pattern assembly (( [16], Chapter 11), [17]).

•
Localist v distributed representation of knowledge.The weight of evidence seems now to favour the 'localist' kind of knowledge representation adopted in the SPS and seems not to favour the 'distributed' style of knowledge representation adopted in DNNs ( [18], pp.461-463).• SP-Neural.Although the SP-Neural version of the SPS (Appendix A.7) is still embryonic, there appears to be considerable potential for the expressions of IC via neural equivalents of ICMUP, SPMA, and SP-grammars, which are fundamental in the abstract version of the SPS.

One-Shot Learning
A striking difference between the SPS and DNNs is that the former, like people, are capable of learning usable knowledge from a single exposure or experience (see Section 8 in [19]) but a DNN needs many repetitions to learn any one concept well enough for it to enter into any other computation.
One-shot learning is illustrated schematically in Figure 1 (A).The yellow rectangle represents the system's memory store.The red disc at the top represents input after it has been compressed from the much larger size represented by the broken-line circle.The red disc below represents the same information after it has been stored.In terms of human cognition, this may be a little misleading because it is likely that some compression is done in sense organs and more is done on the way to storage and in the memory stores themselves.

A C B
Figure 1.Schematic representation of one-shot learning (described in Section 4), transfer learning (Section 5), catastrophic forgetting (Section 6) and information compression (Section 7).How the figure illustrates those concepts is explained in each of those sections.
An example of one-shot learning is the way that one experience of touching something hot may be enough to teach a child to be careful with hot things, and that learning may persist for life.
Again, in any ordinary conversation between two people, each of them is absorbing what the other person is saying, and this is normally without the need for repetition.
When the SPS learns, the first step is to read in 'New' information from the environment in essentially the same way as a tape recorder.When it is stored, this information may enter directly in to varied kinds of analysis, as illustrated by the parsing example in Figure A2 (Appendix A.3) without the need for any complicated preparation.This is quite different from DNNs where large-scale processing may be needed to learn each such 'simple' concepts as a buttercup, a dog, or a cow, and to bring each concept to a level of development where it may be used in other computations.
Clearly, this ability of the SPS to learn things directly without complex analysis is likely to promote parsimony in energy consumption compared with DNNs.

Transfer Learning
"Humans can learn from much less data because we engage in transfer learning, using learning from situations which may be fairly different from what we are trying to learn."Ray Kurzweil ( [20], p. 230).
Anyone who knows how to play ice hockey is likely to adapt fairly easily to playing ordinary hockey.If you know how to ride a bicycle (two wheels) you will probably have little difficulty in riding a tricycle (three wheels).In general, we can, in 'transfer learning', use knowledge from one situation to help us learn things in other situations.
Transfer learning is illustrated schematically in Figure 1 (B), where incoming information shown in red is combined with already-stored information shown in blue.As before, the incoming information has been compressed but this is not shown in the figure.
The B figure is intended to show that in general, both the incoming New information, and the pre-existing Old information with which it is combined, would both be complete without any loss of information.This is illustrated in the example described in Appendix A.4.3 and introduced next.
Although the SPS can learn new things from scratch without any stored knowledge (Appendix A.4.2), transfer learning would normally account for the majority of learning in the SPS.A simple example is described in Appendix A.4.3.Here, an already-known Old SP-pattern, 'A 3 t h a t b o y r u n s #A', is matched with a New SP-pattern, 't h a t g i r l r u n s', with further processing outlined in that section.
In this case, the overall result is the creation of structures that give full weight to both the already-known Old SP-pattern, and the recent New SP-pattern, in accordance with the concept of transfer learning.
In terms of energy efficiency, transfer learning in both people and the SPS can mean substantial advantages because there would be little or no need to relearn things that have already been learned (Section 6).There should be corresponding savings in energy consumption.
Of course, people can and do forget things.This may be a simple defect in people compared with computers, or even records on paper, vellum, and so on.However, it may have a function in human cognition, as part of heuristic search, as described in Appendices A.3 and A.4.5.

Catastrophic Forgetting
An aspect of DNNs which has a bearing on energy consumption is the phenomenon called 'catastrophic forgetting'.This is because, in the standard model for DNNs, the learning of any concept wipes out any previous learning.This is illustrated schematically in Figure 1 (C) where incoming information (shown in red) pushes out, or otherwise destroys, already-stored information (shown in blue).For the avoidance of any confusion, this is not a feature of the SPS but it is a feature of the standard DNN model.This loss of stored information in DNNs is because learning is achieved by adjusting the weights of links between the layers of the DNN, and a set of weights which is right for one concept will not be right for any other concept.Of course one could have a new DNN for every new concept but, considering the number of concepts required in any realistic AI system, this solution is unlikely to be satisfactory.
To the extent that a new DNN for every concept might be regarded as practical, then considering the high energy demands of DNNs in learning one concept, the energy demands of any kind of multi-DNN for the learning of many concepts would be off the scale.Since the SPS does not suffer from catastrophic forgetting and can learn many concepts without any interference among them, the SPS does not suffer from corresponding excesses in energy demands.

Information Compression
Aspects of how IC in the SPS may help in saving energy are discussed in the following subsections.

IC and Reducing the Size of Data
A rather obvious way in which IC might help reduce the computational demands of IT is in reducing the size of data.With regard to the difficulty of moving SKA-generated big data from one place to another (Section 1.1.2),if the size of the immovable bodies of data can be reduced, it may become feasible to move them, although it appears that this has not yet been examined in any detail.
Naturally, such a possibility will vary with the nature of the data.With the SKA, we may guess that levels of redundancy will be high, and that relatively high levels of compression may be achieved.In that case, IC may indeed be sufficient to overcome the problem of moving data from one place to another, and there would be corresponding savings in energy consumption.However, there are potentially much better answers described in the next three subsections.

IC and Probabilities
Another potential benefit of IC in the saving of energy, perhaps more surprising, arises from the intimate connection that exists between IC and concepts of inference and probability.

IC and Probabilities Are Two Sides of the Same Coin
These ideas were pioneered by Ray Solomonoff in the development of his Algorithmic Probability Theory (APT) [21,22].It relates to Algorithmic Information Theory (AIT) where the information content of a body of data is the shortest length of an (idealised) computer program that anyone has been able to find which will generate those data [23].Then, in APT, that shortest computer program provides the most probable hypothesis about the creation of the the original data.
The intimate connection that exists between IC and concepts of inference and probability makes sense in terms of the three aspects of IC described in Appendix A: • IC via the matching and unification of patterns) (Appendix A.2).If two or more patterns match each other, it is not hard to see how, in subsequent processing, the beginnings of one pattern may lead to a prediction that the remainder is likely to follow.For example, if we know that 'black clouds rain' is a recurring pattern, then if we see 'black clouds', it is natural to predict that 'rain' is likely to follow.

•
IC via SP-multiple-alignment (Appendix A.3).When the SPCM encounters a New SP-pattern like 't h e a p p l e s a r e s w e e t', it is likely to start building an SPMA like the one shown in Figure A2.As it proceeds, it is guided by the kinds of probabilistic inferences just mentioned.These apply at all and any level in the hierarchy of structures: at the level of words, at the level of phrases, and at the level of sentences.

•
IC via unsupervised learning (Appendix A.4). Unsupervised learning in the SPCM creates one or more SP-grammars which are effective in the compression of a given set of New SP-patterns.It is this process which ensures that in IC via SPMA, the Old SP-patterns and the values for IC and probability accord with the DONSVIC principle (Appendix A.4.8).

Probabilities and Saving Energy
Probabilities can save energy by guiding the search for matching patterns to the areas where it can be most fruitful.As a simple example, if we are searching for strawberry jam, we are more likely to strike lucky, and to use less energy, if we search in a supermarket than if we search in a car sales showroom.

Processing in Two Stages
Another way in which IC can help to save energy may be achieved by splitting the IC process into two parts.
The general idea is to get some of the energy-demanding processing done in Phase 1, and then to carry on with less powerful computers that have been supplied with copies of the SP-grammar from Phase 1. Providing that the data that is processed in Phase 2 is substantially larger than the data for Phase 1, there is potential for an overall saving of energy.
In more detail: 1.
Phase 1: Create a grammar for data of a given type.

•
Choose a largish sample of data which is representative of the kinds of data to be processed.

•
Process the sample via unsupervised learning within the SPCM to create one or two 'good' SP-grammars for those representative data.This stage is relatively demanding and may be done on a relatively powerful computer or SP Machine (Appendix A.8).

2.
Phase 2: Process one or more new streams of data.Overall, the data for Phase 2 should be very much larger than the data for Phase 1.
• Use computers of relatively low power that have each been supplied with the SP-grammar from Phase 1.

•
Providing each stream is not too large, it should be possible to achieve useful analyses of the data in terms of the SP-grammar.

•
The analyses produced by this processing need not be simply parsings like the one shown in Figure A2.They may be any or all of the kinds of processing described in Appendix A.5.

Model-Based Coding
The problem of communication described in this section may be solved or at least reduced via a new approach to old ideas: 'analysis/synthesis' and, more specifically, the relatively challenging idea of 'Model-Based Coding'.
Analysis/synthesis has been described by Khalid Sayood like this: "Consider an image transmission system that works like this.At the transmitter, we have a person who examines the image to be transmitted and comes up with a description of the image.At the receiver, we have another person who then proceeds to create that image.For example, suppose the image we wish to transmit is a picture of a field of sunflowers.Instead of trying to send the picture, we simply send the words 'field of sunflowers'.The person at the receiver paints a picture of a field of sunflowers on a piece of paper and gives it to the user.Thus, an image of an object is transmitted from the transmitter to the receiver in a highly compressed form."Khalid Sayood ([24], p. 592).
This approach works best with the transmission of speech, probably because the physical structure and properties of the vocal cords, tongue, teeth, and so on, help in the process of creating an analysis of any given sample of speech and in any synthesis of speech that may be derived from that analysis.However, things are more difficult with images, especially if they are moving.
The more ambitious concept of Model-Based Coding was described by John Pierce in 1961 like this: "Imagine that we had at the receiver a sort of rubbery model of a human face.Or we might have a description of such a model stored in the memory of a huge electronic computer.First, the transmitter would have to look at the face to be transmitted and 'make up' the model at the receiver in shape and tint.The transmitter would also have to note the sources of light and reproduce these in intensity and direction at the receiver.Then, as the person before the transmitter talked, the transmitter would have to follow the movements of his eyes, lips and jaws, and other muscular movements and transmit these so that the model at the receiver could do likewise."John Pierce ( [25], Location 2278).
At the time this was written, it would have been impossibly difficult to make things work as described.Pierce says: "Such a scheme might be very effective, and it could become an important invention if anyone could specify a useful way of carrying out the operations I have described.Alas, how much easier it is to say what one would like to do (whether it be making such an invention, composing Beethoven's tenth symphony, or painting a masterpiece on an assigned subject) than it is to do it."( [25], Locations 2278-2287).
Even today, Piece's vision is a major challenge.However, there appears to be a way forward via the development of the SPCM, described in the rest of this section.With some development of the SPCM, especially the generalisation of SP-patterns to accommodate information in two or three dimensions, the SPCM has potential to be very effective in the lossless transmission of big data and in lossless communications via the Internet.

Using an SP-Grammar for the Efficient Transmission of Data
In outline, Model-Based Coding may be made to work as shown in Figure 2.There would be two main elements to the scheme: (1) learning of an abstract description or SP-grammar ('G') for the kind of information to be transmitted; and (2) using G for the efficient transmission of information from A ('Alice') to B ('Bob').

Unsupervised Learning of G
The learning in SP-based Model-Based Coding would be 'unsupervised', meaning learning directly from data without assistance of any kind of "teacher", or the labelling of examples, or rewards or punishments, or anything equivalent.
A strength of the SPS in this connection is that the SP programme of research grew out of earlier research on the unsupervised learning of language [14], and the entire SPCM revolves around learning that is unsupervised.
As in Section 7.3, learning would normally be done independently of any specific transmission, it would be done by a relatively powerful computer, and with a relatively large sample of the kind of data that is to be transmitted, such as a large collection of TV programmes.

Alice and Bob Both Receive Copies of G
Alice and Bob would each receive a copy of G.For example, G may be installed on every new computer, every new smartphone, and in every TV set, and it may also be made available for downloading.
In transmission of any one body of information ('D'), such as one TV programme, D would first be processed by Alice in conjunction with G to create an 'encoding' ('E') which would describe D in terms of the entities and abstract concepts in G.The encoding, E, would then be transmitted to Bob who would use it, in conjunction with his own copy of G, to reconstruct D. Provided that Alice and Bob have the same G, the version of D that is created by Bob should be exactly the same as the version of D that was transmitted by Alice, without loss of information.

E for Any Given D Would Normally Be Very Small Compared with D
Since E would normally be very small compared with D, there would, with one qualification, normally be a large saving in the amount of information to be transmitted compared with the transmission of raw data.Also, for reasons given below, it is likely that E would, with the same qualification, normally be very small compared with what would be transmitted using ordinary compression methods such as LZ, JPEG or MPEG, without the benefit of Model-Based Coding.
The qualification is that any given G would be used for the transmission of many different Ds.If G is only used once or twice, any saving is likely to be relatively small.

Model-Based Coding Compared with Standard Compression Methods
The main differences between Model-Based Coding and alternative schemes using ordinary compression methods are these: 1.
Any 'learning' with ordinary compression methods is part of the encoding stage, not an independent process.

2.
Any such learning with ordinary compression methods is normally relatively unsophisticated and designed to favour speed of processing on low-powered computers rather than high levels of information compression.

3.
In addition, if there is any 'learning' with ordinary compression methods, Alice transmits both G and E together, not E by itself.As we shall see, this is likely to mean much smaller savings than if E is transmitted alone.4.
In some versions of MPEG compression, Alice and Bob may be provided with some elements of G-such as the structure of human faces or bodies-but these are normally hard coded and not learned.Any learning in this case appears to be within a framework that lacks generality, is restricted to such things as faces or bodies, and is without the potential for unsupervised learning of a wide variety of entities and concepts (see, for example [26][27][28]).
In general, there are likely to be relatively large gains in efficiency in transmission with Model-Based Coding compared with transmission with ordinary methods for information compression.However, since the year 2000, few if any researchers have been conducting research on Model-Based Coding, perhaps because of the difficulties that John Pierce anticipated.

The Potential of the SPS for Model-Based Coding
To develop transmission of information via Model-Based Coding as outlined above, the SPS provides a promising way forward.This system has clear potential to provide the main functions that are needed: unsupervised learning of G; encoding of D in terms of G to create E; and lossless recreation of D from E and G.
If the SPS is being used by Alice as a means of transmitting information economically to Bob, then, with a previously learned G playing the part of Old knowledge and a given body of information (D) playing the part of New information, the encoding created by the SPS may play the part of E in the transmission of D, as described above.
Regarding the first of the functions mentioned above-unsupervised learning of G-the SP computer model has already demonstrated unsupervised learning of plausible generative grammars for the syntax of English-like artificial languages, including the learning of segmental structures, classes of structure, and abstract patterns ( [16], Chapter 9)).With non-linguistic or 'semantic' forms of knowledge, the system has clear potential to learn such things as class hierarchies, class heterarchies (meaning class hierarchies with cross classification), part-whole hierarchies, and other forms of knowledge ( [16], Section 9.5)).

Concluding Remarks about Model-Based Coding
Model-Based Coding has great potential to reduce the volumes of data that need to be transmitted in moving big data from one place to another or in communications via the Internet.
Instead of transmitting a 'grammar' and, at the same time, an 'encoding' of the data to be transmitted in terms of the grammar-which, with minor deviations, is what is needed with ordinary compression methods-it is only necessary, most of the time, to transmit only a relatively small encoding of the data.This advantage of Model-Based Coding arises from the fact that in contrast with the use of ordinary compression methods in the transmission of data, both Alice and Bob are equipped with a grammar for the kind of data that is to be transmitted.
Preliminary trials indicate that the volume of information to be transmitted with Model-Based Coding may be less than 6% of the volume of information to be transmitted with ordinary compression methods.
Until recently, it has not been feasible to convert John Pierce's vision into something that may be applied in practice.Now, with the development of the SPS, there is clear potential to realise the three main functions that will be needed: unsupervised learning of a grammar for the kind of data that is to be transmitted; the encoding of any one example of such data in terms of the grammar; and decoding of the encoding to retrieve the given example.
It appears now to be feasible to develop at least an approximation to these capabilities within the foreseeable future.By contrast with other work on Model-Based Coding, unsupervised learning in the SPS has the potential to learn what will normally be the great diversity of entities and concepts that are implicit in the data.
With these developments, big data may glide quickly and efficiently from one place to another, without the need for massive bandwidth, and without needing the output of a small power station to haul it on its way.In addition, there may be less need to worry about possible shortages of bandwidth on the Internet or shortages of energy to power the Internet.

Conclusions
The SP System (SPS), meaning the SP Theory of Intelligence and its realisation in the SP Computer Model, is an AI system under development with potential to cut energy demands in IT systems such as those for deep neural networks (DNNs) and the management of big data.Likewise for CO 2 emissions when the energy comes from the burning of fossil fuels.
The biological foundations of the SPS suggest that with further development, the SPS may approach the extraordinarily low (20 watt) energy demands of the human brain.
Any such achievement would be partly because the SPS, like people, may learn usable knowledge from a single exposure or experience (one-shot learning).In comparison, deep neural networks (DNNs) need many repetitions for the learning of one concept.
Again, the SPS, like people, can incorporate old learning in new (transfer learning), in contrast to DNNs where new learning wipes out old learning ('catastrophic forgetting').
Other ways in which the mature SPS is likely to prove relatively parsimonious in its demands for energy arise from the central role of information compression (IC) in the organisation and workings of the system:

•
IC makes data smaller, so there is less to process.

•
The close connection between IC and concepts of probability mean that there are probabilities that can be exploited to improve the efficiency of searching for matches between patterns.

•
Model-Based Coding, described by John Pierce in 1961, may become a reality with the development of an industrial-strength SP-Machine: -With a relatively powerful computer, create an SP-grammar from, for example, a collection of TV programmes.-Distribute the SP-grammar to TV transmitters and many computerised TV receivers.
-For each programme to be transmitted, Alice first encodes it in terms of the SP-grammar, the relatively small encoding is then transmitted, and, finally, Bob decodes the encoding in terms of the SP-grammar to recreate the programme exactly.-This makes the greatest savings when the creation and distribution of the SPgrammar is relatively infrequent compared with the uses of the SP-grammar in the encoding and decoding of TV programmes and the like.
Taking a global view of the SPS, there is considerable potential in future versions for substantial economies in energy demands.
Here is a summary of how SPMAs like the one shown in the figure are formed: 1.
At the beginning of processing, the SPCM has a store of Old SP-patterns including those shown in rows 1 to 8 (one SP-pattern per row), and many others.When the SPCM is more fully developed, those Old SP-patterns would have been learned from raw data as outlined in Appendix A.4, but for now they are supplied to the program by the user.

2.
The next step is to read in the New SP-pattern, 't h e a p p l e s a r e s w e e t'.

3.
Then the program searches for 'good' matches between SP-patterns, where 'good' means matches that yield relatively high levels of compression of the New SP-pattern in terms of Old SP-patterns with which it has been unified.The details of relevant calculations are given in (see ( [29], Section 4.1) and ( [16], Section 3.5)).

4.
As can be seen in the figure, matches are identified at early stages between (parts of) the New SP-pattern and (parts of) the Old SP-patterns 'D 17 t h e #D', 'N Nr 6 a p p l e #N', 'V Vp 11 a r e #V', and 'A 21 s w e e t #A'.

5.
Each of these matches may be seen as a partial SPMA.For example, the match between 't h e' in the New SP-pattern and the Old SP-pattern 'D 17 t h e #D' may be seen as an SPMA between the SP-pattern in row 0 and the SP-pattern in row 3. 6.
After unification of the matching symbols, each such SPMA may be seen as a single SP-pattern.So the unification of 't h e' with 'D 17 t h e #D' yields the unified SP-pattern 'D 17 t h e #D', with exactly the same sequence of SP-symbols as the second of the two SP-patterns from which it was derived.7.
As processing proceeds, similar pair-wise matches and unifications eventually lead to the creation of SPMAs like that shown in Figure A2.At every stage, all the SPMAs that have been created are evaluated in terms of IC (details of the coding are described in (see (Section 4.1 in [29]) and (Section 3.5 in [16])), and then the best SPMAs are retained and the remainder are discarded.In this case, the overall 'winner' is the SPMA shown in Figure A2.8.
This process of searching for good SPMAs in stages, with selection of good partial solutions at each stage, is an example of heuristic search.This kind of search is necessary because there are too many possibilities for much to be achieved via exhaustive search within a reasonable time.By contrast, heuristic search can normally deliver results that are reasonably good within a reasonable time, but it cannot guarantee that the best possible solution has been found.
As noted in the caption to Figure A2, the SPMA in the figure achieves the effect of parsing the sentence into its parts and sub-parts.However, the beauty of the SPMA construct is that it can model many more aspects of intelligence besides the parsing of a sentence.These are summarised in Appendix A.5, although unsupervised learning (Appendix A.4) is a little different from the others.
Why is this principle not more fully recognised in processes for IC such as LV, JPEG, and MPEG?It seems that this is probably because processes like LV etc have been designed to achieve speed on relatively low-powered computers, with corresponding sacrifices in IC efficiency.The DONSVIC principle is likely to become more important with computers that can achieve higher levels of efficiency in IC. and 'semantic'.Appendix A.6.Potential Benefits and Applications of the SPS

SP
Apart from its strengths and potential in modelling AI-related functions (Appendix A.5), it appears that in more humdrum terms, the SPS has several potential benefits and applications, several of them described in peer-reviewed papers.These include:

•
Big data.Somewhat unexpectedly, it has been discovered that the SPS has potential to help solve nine significant problems associated with big data [1].These are: overcoming the problem of variety in big data; the unsupervised learning of structures and relationships in big data; interpretation of big data via pattern recognition, natural language processing; the analysis of streaming data; compression of big data; Model-Based Coding for the efficient transmission of big data; potential gains in computational and energy efficiency in the analysis of big data; managing errors and uncertainties in data; and visualisation of structure in big data and providing an audit trail in the processing of big data.

•
Autonomous robots.The SPS opens up a radically new approach to the development of intelligence in autonomous robots [32]; • An intelligent database system.The SPS has potential in the development of an intelligent database system with several advantages compared with traditional database systems [33].In this connection, the SPS has potential to add several kinds of reasoning and other aspects of intelligence to the 'database' represented by the World Wide Web, especially if the SP Machine were to be supercharged by replacing the search mechanisms in the foundations of the SP Machine with the high-parallel search mechanisms of any of the leading search engines.

•
Medical diagnosis.The SPS may serve as a vehicle for medical knowledge and to assist practitioners in medical diagnosis, with potential for the automatic or semi-automatic learning of new knowledge [34]; • Computer vision and natural vision.The SPS opens up a new approach to the development of computer vision and its integration with other aspects of intelligence.It also throws light on several aspects of natural vision [30]; • Neuroscience.Abstract concepts in the SP Theory of Intelligence map quite well into concepts expressed in terms of neurons and their interconnections in a version of the theory called SP-Neural ( [17], ( [16], Chapter 11)).This has potential to illuminate aspects of neuroscience and to suggest new avenues for investigation.

•
Commonsense reasoning.In addition to the previously described strengths of the SPS in several kinds of reasoning, the SPS has potential in the surprisingly challenging area of "commonsense reasoning and commonsense knowledge" [35].How the SPS may meet the several challenges in this area is described in [36].

•
Other areas of application.The SPS has potential in several other areas of application including ones described in [37]: the simplification and integration of computing systems; best-match and semantic forms of information retrieval; software engineering [38]; the representation of knowledge, reasoning, and the semantic web; information compression; bioinformatics; the detection of computer viruses; and data fusion.

•
Mathematics.The concept of ICMUP provides an entirely novel interpretation of mathematics [39].This interpretation is quite unlike anything described in existing writings about the philosophy of mathematics or its application in science.There are potential benefits in science and beyond from this new interpretation of mathematics.
Appendix A.7. SP-Neural The SPS has been developed primarily in terms of abstract concepts such as the SPMA construct (Appendix A.3).However, a version of the SPS called SP-Neural was also proposed, expressed in terms of neurons and their inter-connections and inter-communications. Current thinking in that area is described in [17].
Two points of interest are noted here: • Neural validation.Although SP-Neural is derived from the SPS, an abstract model of information processing, it maps quite well on to known features of neural tissue in the brain.This may be seen as a kind of neural validation of the abstract model from which it derives.

•
The role of inhibition in the brain.In view of the importance of IC as a unifying principle in the SPS, and in view of the prevalence of inhibitory tissue in the brain, the known role of inhibitory neurons in some parts of the nervous system (( [40], p. 505), [41]), suggests that inhibition could prove to be the key to understanding how IC may be achieved in SP-Neural, and hence in real brains.

Appendix A.8. Development of an SP Machine
In view of the strengths and potential of the SPS (Appendix A.5) and its potential benefits and applications (Appendix A.6), the SPCM appears to have promise as the foundation for the development of an SP Machine, as described in [42].
It is envisaged that the SP Machine wills feature high levels of parallel processing and a good user interface.It may serve as a vehicle for further development of the SPS by researchers anywhere.Eventually, it should become a system with industrial strength that may be applied to the solution of many problems in government, commerce, and industry.A schematic view of this development is shown in Figure A5.

Figure 2 .
Figure 2. A schematic view of how, with Model-Based Coding, information may be transmitted efficiently from Alice to Bob.

Figure A4 .
Figure A4.A schematic representation of versatility and integration in the SPS, with SPMA centre stage.

Figure A5 .
Figure A5.Schematic representation of the development and application of the SP Machine.Reproduced from Figure 2 in [29], with permission.