The SP Theory of Intelligence: An Overview
Abstract
:1. Introduction
2. Origins and Motivation
2.1. Information Compression
- For any given body of information, I, information compression may reduce its size and thus facilitate the storage, processing and transmission of I.
- Perhaps more important is the close connection between information compression and concepts of prediction and probability (see, for example, [14]). In the SP system, it is the basis for all kinds of inference and for calculations of probabilities.
2.2. The Matching and Unification of Patterns
2.3. Simplification and Integration of Concepts
2.4. Transparency in the Representation of Knowledge
2.5. Development of the Theory
3. Introduction to the SP Theory
- The SP theory is conceived as an abstract brain-like system that, in an “input” perspective, may receive “New” information via its senses and store some or all of it in its memory as “Old” information, as illustrated schematically in Figure 1. There is also an “output” perspective, described in Section 4.5.
- The theory is realised in the form of a computer model, introduced in Section 3.1, below, and described more fully later.
- All New and Old information is expressed as arrays (patterns) of atomic symbols in one or two dimensions. An example of an SP pattern may be seen in each row in Figure 4. Each symbol can be matched in an all-or-nothing manner with any other symbol. Any meaning that is associated with an atomic symbol or group of symbols must be expressed in the form of other atomic symbols.
- Each pattern has an associated frequency of occurrence, which may be assigned by the user or derived via the processes for unsupervised learning. The default value for the frequency of any pattern is 1.
- The system is designed for the unsupervised learning of Old patterns by compression of New patterns [23].
- An important part of this process is, where possible, the economical (compressed) encoding of New patterns in terms of Old patterns. This may be seen to achieve such things as pattern recognition, parsing or understanding of natural language, or other kinds of interpretation of incoming information in terms of stored knowledge, including several kinds of reasoning.
- In keeping with the remarks in Section 2.2, compression of information is achieved via the matching and unification (merging) of patterns. In this, there are key roles for the frequency of occurrence of patterns, and their sizes.
- The concept of multiple alignment, described in Section 4, is a powerful central idea, similar to the concept of multiple alignment in bioinformatics, but with important differences.
- Owing to the intimate connection, previously mentioned, between information compression and concepts of prediction and probability, it is relatively straightforward for the SP system to calculate probabilities for inferences made by the system and probabilities for parsings, recognition of patterns, and so on (Section 4.4).
- In developing the theory, I have tried to take advantage of what is known about the psychological and neurophysiological aspects of human perception and cognition and to ensure that the theory is compatible with such knowledge (see Section 14).
3.1. The SP Computer Model
- As an antidote to vagueness. As with all computer programs, processes must be defined with sufficient detail to ensure that the program actually works.
- By providing a convenient means of encoding the simple but important mathematics that underpins the SP theory, and performing relevant calculations, including calculations of probability.
- By providing a means of seeing quickly the strengths and weaknesses of proposed mechanisms or processes. Many ideas that looked promising have been dropped as a result of this kind of testing.
- By providing a means of demonstrating what can be achieved with the theory.
3.2. The SP Machine
3.3. Unfinished Business
- Processing of information in two or more dimensions. No attempt has yet been made to generalise the SP model to work with patterns in two dimensions, although that appears to be feasible to do, as outlined in BK (Section 13.2.1). As noted in BK (Section 13.2.2), it is possible that information with dimensions higher than two may be encoded in terms of patterns in one or two dimensions, somewhat in the manner of architects’ drawings. A 3D structure may be stitched together from several partially-overlapping 2D views, in much the same way that, in digital photography, a panoramic view may be created from partially-overlapping pictures (Sections 6.1 and 6.2 in [27]).
- Recognition of perceptual features in speech and visual images. For the SP system to be effective in the processing of speech or visual images, it seems likely that some kind of preliminary processing will be required to identify low-level perceptual features, such as, in the case of speech, phonemes, formant ratios or formant transitions, or, in the case of visual images, edges, angles, colours, luminances or textures. In vision, at least, it seems likely that the SP framework itself will prove relevant, since edges may be seen as zones of non-redundant information between uniform areas containing more redundancy and, likewise, angles may be seen to provide significant information where straight edges, with more redundancy, come together (Section 3 in [27]). As a stop-gap solution, the preliminary processing may be done using existing techniques for the identification of low-level perceptual features (Chapter 13 in [28]).
- Unsupervised learning. A limitation of the SP computer model as it is now is that it cannot learn intermediate levels of abstraction in grammars (e.g., phrases and clauses), and it cannot learn the kinds of discontinuous dependencies in natural language syntax that are described in Section 8.1 to Section 8.3. I believe these problems are soluble and that solving them will greatly enhance the capabilities of the system for the unsupervised learning of structure in data (Section 5).
- Processing of numbers. The SP model works with atomic symbols, such as ASCII characters or strings of characters with no intrinsic meaning. In itself, the SP system does not recognise the arithmetic meaning of numbers such as “37” or “652” and will not process them correctly. However, the system has the potential to handle mathematical concepts if it is supplied with patterns representing Peano’s axioms or similar information (BK, Chapter 10). As a stop-gap solution in the SP machine, existing technologies may provide whatever arithmetic processing may be required.
4. The Multiple Alignment Concept
4.1. Coding and the Evaluation of an Alignment in Terms of Compression
- (1)
- Scan the multiple alignment from left to right looking for columns that contain an ID-symbol by itself, not aligned with any other symbol.
- (2)
- Copy these symbols into a code pattern in the same order that they appear in the multiple alignment.
4.1.1. Compression Difference and Compression Ratio
4.2. The Building of Multiple Alignments
- (1)
- Identify a set of “driving” patterns and a set of “target” patterns. At the beginning, the New pattern is the sole driving pattern, and the Old patterns are the target patterns. In all subsequent stages, the best of the multiple alignments formed so far (in terms of their scores) are chosen to be driving patterns, and the target patterns are the Old patterns together with a selection of the best multiple alignments formed so far, including all of those that are driving patterns.
- (2)
- Compare each driving pattern with each of the target patterns to find full matches and good partial matches between patterns. This is done with a process that is essentially a form of “dynamic programming” [33], somewhat like the WinMerge utility for finding similarities and differences between files [34]. The process is described quite fully in BK (Appendix A) and outlined in Section 4.2.1, below. The main difference between the SP process and others is that the former can deliver several alternative matches between a pair of patterns, while WinMerge and standard methods for finding alignments deliver one “best” result.
- (3)
- From the best of the matches found in the current stage, create corresponding multiple alignments and add them to the repository of multiple alignments created by the program.
4.2.1. Finding Good Matches between Patterns
- (1)
- The query is processed left to right, one symbol at a time.
- (2)
- Each symbol in the query is, in effect, broadcast to every symbol in the database to make a yes/no match in each case.
- (3)
- Every positive match (hit) between a symbol from the query and a symbol in the database is recorded in a hit structure, illustrated in the figure.
- (4)
- If the memory space allocated to the hit structure is exhausted at any time, then the hit structure is purged: the leaf nodes of the tree are sorted in reverse order of their probability values, and each leaf node in the bottom half of the set is extracted from the hit structure, together with all nodes on its path which are not shared with any other path. After the hit structure has been purged, the recording of hits may continue using the space, which has been released.
4.2.2. Noisy Data
4.3. Computational Complexity
4.4. Calculation of Probabilities Associated with Multiple Alignments
4.4.1. Absolute Probabilities
4.4.2. Relative Probabilities
- (1)
- For the multiple alignment which has the highest (which we shall call the reference multiple alignment), identify the reference set of symbols in New, meaning the symbols from New which are encoded by the multiple alignment.
- (2)
- Compile a reference set of multiple alignments, which includes the reference multiple alignment and all other multiple alignments (if any), which encode exactly the reference set of symbols from New, neither more nor less.
- (3)
- Calculate the sum of the values for in the reference set of multiple alignments:
- (4)
- For each multiple alignment in the reference set, calculate its relative probability as:
4.4.3. A Generalisation of the Method for Calculating Absolute and Relative Probabilities
4.4.4. Relative Probabilities of Patterns and Symbols
- (1)
- Compile a set of patterns from Old, each of which appears at least once in the reference set of multiple alignments. No single pattern from Old should appear more than once in the set.
- (2)
- For each pattern, calculate a value for its relative probability as the sum of the values for the multiple alignments in which it appears. If a pattern appears more than once in a multiple alignment, it is only counted once for that multiple alignment.
- (3)
- Compile a set of symbol types, which appear anywhere in the patterns identified in step 2.
- (4)
- For each alphabetic symbol type identified in step 3, calculate its relative probability as the sum of the relative probabilities of the patterns in which it appears. If it appears more than once in a given pattern, it is only counted once.
4.5. One System for Both the Analysis and the Production of Information
5. Unsupervised Learning
5.1. Outline of Unsupervised Learning in the SP Model
5.1.1. Deriving Old Patterns from Multiple Alignments
5.1.2. Evaluating and Selecting Sets of Newly-Created Old Patterns
5.1.3. Plotting Values for G, E and T
5.1.4. Limitations in the SP Model and How They May Be Overcome
5.1.5. Computational Complexity
5.2. The Discovery of Natural Structures Via Information Compression (DONSVIC)
- Figure 10 shows part of a parsing of an unsegmented sample of natural language text created by the MK10 program [40] using only the information in the sample itself and without any prior dictionary or other knowledge about the structure of language. Although all spaces and punctuation had been removed from the sample, the program does reasonably well in revealing the word structure of the text. Statistical tests confirm that it performs much better than chance.
- The same program does quite well—significantly better than chance—in revealing phrase structures in natural language texts that have been prepared, as before, without spaces or punctuation—but with each word replaced by a symbol for its grammatical category [41]. Although that replacement was done by a person trained in linguistic analysis, the discovery of phrase structure in the sample is done by the program, without assistance.
- The SNPR program for grammar discovery [38] can, without supervision, derive a plausible grammar from an unsegmented sample of English-like artificial language, including the discovery of words, of grammatical categories of words, and the structure of sentences.
- In a similar way, with samples of English-like artificial languages, the SP model has demonstrated an ability to learn plausible structures, including words, grammatical categories of words and the structure of sentences.
5.3. Generalisation, the Correction of Overgeneralisations and Learning from Noisy Data
- Given that we learn from a finite sample [42], represented by the smallest envelope in the figure, how do we generalise from that finite sample to a knowledge of the language corresponding to the middle-sized envelope, without overgeneralising into the region between the middle envelope and the outer one?
- How do we learn a “correct” version of our native language despite what is marked in the figure as “dirty data” (sentences that are not complete, false starts, words that are mispronounced, and more)?
- As a general rule, the greatest reductions in are achieved with grammars that represent moderate levels of generalisation, neither too little nor too much. In practice, the SNPR program, which is designed to minimise , has been shown to produce plausible generalisations, without over-generalising [38].
- Any particular error is, by its nature, rare, and so, in the search for useful patterns (which, other things being equal, are the more frequently-occurring ones), it is discarded from the grammar along with other “bad” structures [44]. In the case of lossless compression, errors in any given body of data, I, would be retained in the encoding of I. However, with learning, it is normally the grammar and not the encoding that is the focus of interest. In practice, the MK10 and SNPR programs have been found to be quite insensitive to errors (of omission, addition or substitution) in their data, much as in the building of multiple alignments (Section 4.2.2).
5.4. One-Trial Learning and Its Implications
6. Computing, Mathematics and Logic
6.1. Conventional Computing Systems
6.2. Mathematics and Logic
6.3. Computing and Probabilities
- It appears that computing, mathematics and logic are more probabilistic than our ordinary experience of them might suggest. Gregory Chaitin has written: “I have recently been able to take a further step along the path laid out by Gödel and Turing. By translating a particular computer program into an algebraic equation of a type that was familiar even to the ancient Greeks, I have shown that there is randomness in the branch of pure mathematics known as number theory. My work indicates that—to borrow Einsteins metaphor—God sometimes plays dice with whole numbers.” (p. 80 in [49] ).
- The SP system may imitate the clockwork nature of ordinary computers by delivering probabilities of 0 and 1. This can happen with certain kinds of data, or tight constraints on the process of searching the abstract space of alternative matches, or both those things.
- It seems likely that the all-or-nothing character of conventional computers has its origins in the low computational power of early computers. In those days, it was necessary to apply tight constraints on the process of searching for matches between patterns. Otherwise, the computational demands would have been overwhelming. Similar things may be said about the origins of mathematics and logic, which have been developed for centuries without the benefit of any computational machine, except very simple and low-powered devices. Now that it is technically feasible to apply large amounts of computational power, constraints on searching may be relaxed.
7. Representation of Knowledge
8. Natural Language Processing
- Both the parsing and production of natural language may be modelled via the building of multiple alignments (Section 4.5; BK, Section 5.7).
- The system can accommodate syntactic ambiguities in language (BK, Section 5.2) and also recursive structures (BK, Section 5.3).
- The framework provides a simple, but effective means of representing discontinuous dependencies in syntax (Section 8.1 to Section 8.3, below; BK, Sections 5.4 to 5.6).
- The system may also model non-syntactic “semantic” structures, such as class-inclusion hierarchies and part-whole hierarchies (Section 9.1).
- Because there is one simple format for different kinds of knowledge, the system facilitates the seamless integration of syntax with semantics (BK, Section 5.7).
- The system is robust in the face of errors of omission, commission or substitution in data (Section 4.2.2 and Section 5.3).
- The importance of context in the processing of language [52] is accommodated in the way the system searches for a global best match for patterns: any pattern or partial pattern may be a context for any other.
8.1. Discontinuous Dependencies in Syntax
8.2. Two Quasi-Independent Patterns of Constraint in English Auxiliary Verbs
- Each letter represents a category for a single word:
- –
- “M” stands for “modal” verbs, like “will”, “can”, “would”, etc.
- –
- “H” stands for one of the various forms of the verb, “to have”.
- –
- Each of the two instances of “B” stands for one of the various forms of the verb, “to be”.
- –
- “V” stands for the main verb, which can be any verb, except a modal verb (unless the modal verb is used by itself).
- The words occur in the order shown, but any of the words may be omitted.
- Questions of “standard” form follow exactly the same pattern as statements, except that the first verb, whatever it happens to be (“M”, “H”, the first “B”, the second “B” or “V”), precedes the subject noun phrase instead of following it.
M H B B V
Will it have been being washed?
M H B B V
- Apart from the modals, which always have the same form, the first verb in the sequence, whatever it happens to be (“H”, the first “B”, the second “B” or “V”), always has a “finite” form (the form it would take if it were used by itself with the subject).
- If an “M” auxiliary verb is chosen, then whatever follows it (“H”, first “B”, second “B” or “V”) must have an “infinitive” form (i.e., the “standard” form of the verb as it occurs in the context, “to ...”, but without the word “to”).
- If an “H” auxiliary verb is chosen, then whatever follows it (the first “B”, the second “B” or “V”) must have a past tense form, such as “been”, “seen”, “gone”, “slept”, “wanted”, etc. In Chomsky’s Syntactic Structures [53], these forms were characterised as en forms, and the same convention has been adopted here.
- If the first of the two “B” auxiliary verbs is chosen, then whatever follows it (the second “B” or “V”) must have an ing form, e.g., “singing”, “eating”, “having”, “being”, etc.
- If the second of the two “B” auxiliary verbs is chosen, then whatever follows it (only the main verb is possible now) must have a past tense form (marked with en, as above).
- The constraints apply to questions in exactly the same way as they do to statements.
8.3. Multiple Alignments and English Auxiliary Verbs
- The first verb, “is”, is marked as having the finite form (with the symbol “FIN” in columns 5 and 7). The same word is also marked as being a form of the verb “to be” (with the symbol “B” in columns 4, 5 and 6). Because of its position in the parsing, we know that it is an instance of the second “B” in the sequence “M H B B V”.
- The second verb, “washed”, is marked as being in the en category (with the symbol “EN” in columns 1 and 4).
- That a verb corresponding to the second instance of “B” must be followed by an en kind of verb is expressed by the pattern, “B XV EN”, in column 4.
- The first verb, “will”, is marked as modal (with “M” in columns 7, 8 and 14).
- The second verb, “have”, is marked as having the infinitive form (with “INF” in columns 11 and 14), and it is also marked as a form of the verb, “to have” (with “H” in columns 11, 12, and 15).
- That a modal verb must be followed by a verb of infinitive form is marked with the pattern, “M INF”, in column 14.
- The third verb, “been”, is marked as being a form of the verb, “to be” (with “B” in columns 2, 3 and 16). Because of its position in the parsing, we know that it is an instance of the second “B” in the sequence, “M H B B V”. This verb is also marked as belonging to the en category (with “EN” in columns 2 and 15).
- That an “H” verb must be followed by an “EN” verb is marked with the pattern, “H EN”, in column 15.
- The fourth verb, “broken”, is marked as being in the en category (with “EN” in columns 4 and 16).
- That a “B” verb (second instance) must be followed by an “EN” verb is marked with the pattern, “B XV EN”, in column 16.
9. Pattern Recognition
- It can model pattern recognition at multiple levels of abstraction, as described in BK (Section 6.4.1), and with the integration of class-inclusion relations with part-whole hierarchies (Section 9.1; BK, Section 6.4.1).
- The SP system can accommodate “family resemblance” or polythetic categories, meaning that recognition does not depend on the presence absence of any particular feature or combination of features. This is because there can be alternatives at any or all locations in a pattern and, also, because of the way the system can tolerate errors in data (next point).
- The system is robust in the face of errors of omission, commission or substitution in data (Section 4.2.2).
- The system facilitates the seamless integration of pattern recognition with other aspects of intelligence: reasoning, learning, problem solving, and so on.
- A probability may be calculated for any given identification, classification or associated inference (Section 4.4).
9.1. Class Hierarchies, Part-Whole Hierarchies and Their Integration
9.2. Inference and Inheritance
10. Probabilistic Reasoning
10.1. Nonmonotonic Reasoning and Reasoning with Default Values
10.1.1. Typically, Birds Fly
P penguin Bd f cannotfly #f #Bd ... #P
O ostrich Bd f cannotfly #f #Bd ... #O.
10.1.2. Tweety is a Bird, So, Probably, Tweety Can Fly
10.1.3. Tweety Is a Penguin, So Tweety Cannot Fly
10.2. Reasoning in Bayesian Networks, Including “Explaining Away”
Normally an alarm sound alerts us to the possibility of a burglary. If somebody calls you at the office and tells you that your alarm went off, you will surely rush home in a hurry, even though there could be other causes for the alarm sound. If you hear a radio announcement that there was an earthquake nearby and if the last false alarm you recall was triggered by an earthquake, then your certainty of a burglary will diminish.(pp. 8–9 in [59] )
10.2.1. Representing Contingencies with Patterns and Frequencies
10.2.2. Approximating the Temporal Order of Events
10.2.3. Other Considerations
- No attempt has been made to represent the idea that “the last false alarm you recall was triggered by an earthquake” (p. 9 in [59]). At some stage in the development of the SP system, there will be a need to take account of recency (BK, Section 13.2.6).
- With these imaginary frequency values, it has been assumed that burglaries (with a total frequency of occurrence of 1,160) are much more common than earthquakes (with a total frequency of 100). As we shall see, this difference reinforces the belief that there has been a burglary when it is known that the alarm has gone off (but without additional knowledge of an earthquake).
- In accordance with Pearl’s example (p. 49 in [59]) (but contrary to the phenomenon of looting during earthquakes), it has been assumed that earthquakes and burglaries are independent. If there was some association between them, then, in accordance with the closed-world assumption, there should be a pattern in Figure 22 representing the association.
10.2.4. Formation of Alignments: The Burglar Alarm has Sounded
Symbol | Probability |
---|---|
alarm | 1.0 |
burglary | 0.328 |
earthquake | 0.016 |
10.3. Formation of Alignments: The Burglar Alarm Has Sounded and There is a Radio Announcement of an Earthquake
10.3.1. Other Possibilities
- A burglary (which triggered the alarm) and, at the same time, an earthquake (which led to a radio announcement) or
- An earthquake that triggered the alarm and led to a radio announcement and, at the same time, a burglary that did not trigger the alarm.
- Many other unlikely possibilities of a similar kind ([59], also discussed in Section 2.2.4 of this article).
10.4. The SP framework and Bayesian Networks
- Undue complexity in the storage of statistical knowledge. Each node in a Bayesian network contains a table of conditional probabilities for all possible combinations of inputs, and these tables can be quite large. By contrast, the SP framework only requires a single measure of frequency for each pattern. A focus on frequencies seems to yield an overall advantage in terms of simplicity compared with the representation of statistical knowledge in the form of conditional probabilities.
- Diverting attention from simpler alternatives. By emphasising probabilities, Bayes’ theorem diverts attention away from simpler and more primitive concepts of matching and unification of patterns, which, by hypothesis, provide the foundation for several aspects of intelligence (Section 2.2).
- No place for structural learning. Bayes’ theorem assumes that the objects and categories that are to be related to each other via conditional probabilities are already “given”. It has nothing to say about how ontological knowledge may be created from raw perceptual input. By contrast, the SP framework provides for the discovery of objects and other categories via the matching and unification of patterns, in accordance with the DONSVIC principle (Section 5.2).
10.5. Causal Diagnosis
10.6. An SP Approach to Causal Diagnosis
- The input-output relations of any component may be represented as a set of patterns, each one with a measured or estimated frequency of occurrence.
- With suitable extensions, these patterns may serve to transfer the output of one component to the input of another.
- A “framework” pattern (shown at the bottom of Figure 27) is needed to ensure that appropriate multiple alignments can be built.
10.7. Multiple Alignments in Causal Diagnosis
Bad Component(s) | Relative Probability |
---|---|
M1 | 0.6664 |
M4 | 0.3332 |
M1, M3 | 0.00013 |
M1, M2 | 0.00013 |
M1, M4 | 6.664e-5 |
M3, M4 | 6.664e-5 |
M1, M2, M3 | 2.666e-8 |
11. Information Storage and Retrieval
- The storage and retrieval of information is integrated with other aspects of intelligence, such as pattern recognition, reasoning, planning, problem solving and learning—as outlined elsewhere in this article.
- The SP system provides a simple but effective means of combining class hierarchies with part-whole hierarchies, with inheritance of attributes (Section 9.1).
- It provides for cross-classification with multiple inheritance.
- There is flexibility and versatility in the representation of knowledge arising from the fact that the system does not distinguish “parts” and “attributes” (Section 4.2.1 in [35] ).
- Likewise, the absence of a distinction between “class” and “object” facilitates the representation of knowledge and eliminates the need for a “metaclass” (Section 4.2.2 in [35] ).
- SP patterns provide a simpler and more direct means of representing entity-relationship models than do relational tuples (Section 4.2.3 in [35] ).
12. Planning and Problem Solving
large circle above small triangle
D small square inside large circle #C1
C2 small square inside large ellipse ;
E large square above small ellipse #C2
C3 small square inside large ellipse ;
F small circle left-of large square #C3
C4 small square inside large ellipse ;
G small ellipse above large rectangle #C4.
13. Compression of Information
- The discovery of recurrent patterns in data via the building of multiple alignments, with heuristic search to sift out the patterns that are most useful in terms of compression.
- The potential of the system to detect and encode discontinuous dependencies in data. It appears that there is potential here to extract kinds of redundancy in information that are not accessible via standard methods for the compression of information.
14. Human Perception, Cognition and Neuroscience
15. Conclusions
Acknowledgements
Conflict of Interest
References and Notes
- Apart from the period between early 2006 and late 2012, when I was working on other things.
- See www.cognitionresearch.org/sp.htm#PUBS.
- Wolff, J.G. Unifying Computing and Cognition: the SP Theory and Its Applications; CognitionResearch.org.uk: Menai Bridge, UK, 2006. [Google Scholar]
- Some of the text and figures in this article come from the book, with permission. Details of other permissions are given at appropriate points in the article.
- Attneave, F. Some informational aspects of visual perception. Psychol. Rev. 1954, 61, 183–193. [Google Scholar] [CrossRef] [PubMed]
- Barlow, H.B. Sensory Mechanisms, the Reduction of Redundancy, and Intelligence. In The Mechanisation of Thought Processes; Her Majesty’s Stationery Office: London, UK, 1959; pp. 535–559. [Google Scholar]
- Barlow, H.B. Trigger Features, Adaptation and Economy of Impulses. In Information Processes in the Nervous System; Leibovic, K.N., Ed.; Springer: New York, NY, USA, 1969; pp. 209–230. [Google Scholar]
- Also relevant and still of interest is Zipf’s [68] Human Behaviour and the Principle of Least Effort. Incidentally, Barlow later suggested that “... the [original] idea was right in drawing attention to the importance of redundancy in sensory messages ... but it was wrong in emphasizing the main technical use for redundancy, which is compressive coding.” (p. 242 in [69]). As we shall see, the SP theory is closer to Barlow’s original thinking than what he said later.
- This focus on compression of information in binocular vision is distinct from the more usual interest in the way that slight differences between the two images enables us to see the scene in depth.
- Wolff, J.G. Learning Syntax and Meanings through Optimization and Distributional Analysis. In Categories and Processes in Language Acquisition; Levy, Y., Schlesinger, I.M., Braine, M.D.S., Eds.; Lawrence Erlbaum: Hillsdale, NJ, USA, 1988; pp. 179–215. [Google Scholar]
- See www.cognitionresearch.org/lang learn.html.
- Solomonoff, R.J. A formal theory of inductive inference. Part I. Inf. Control 1964, 7, 1–22. [Google Scholar] [CrossRef]
- Solomonoff, R.J. A formal theory of inductive inference. Part II. Inf. Control 1964, 7, 224–254. [Google Scholar] [CrossRef]
- Li, M.; Vitänyi, P. An Introduction to Kolmogorov Complexity and Its Applications; Springer: New York, NY, USA, 2009. [Google Scholar]
- Newell, A. You can’t Play 20 Questions with Nature and Win: Projective Comments on the Papers in This Symposium. In Visual Information Processing; Chase, W.G., Ed.; Academic Press: New York, NY, USA, 1973; pp. 283–308. [Google Scholar]
- Laird, J.E. The Soar Cognitive Architecture; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Anderson, J.R.; Bothell, D.; Byrne, M.D.; Douglass, S.; Lebiere, C.; Qin, Y. An integrated theory of the mind. Psychol. Rev. 2004, 111, 1036–1060. [Google Scholar] [CrossRef] [PubMed]
- Schmidhuber, J.; Thörisson, K.R.; Looks, M. (Eds.) Artificial General Intelligence: 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3–6, 2011, Proceedings; Volume 6830, Lecture Notes in Artificial Intelligence; Springer: New York, NY, USA, 2011.
- Dodig-Crnkovic, G. Significance of models of computation, from Turing model to natural computation. Minds Mach. 2011, 21, 301–322. [Google Scholar] [CrossRef]
- Steunebrink, B.R.; Schmidhuber, J. A family of Gödel machine implementations. In [18]. Available online: www.idsia.ch/juergen/agi2011bas.pdf (accessed on 31 July 2013).
- Hutter, M. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability; Springer: Berlin, Germany, 2005. [Google Scholar]
- Wolff, J.G. Simplicity and power—some unifying ideas in computing. Comput. J. 1990, 33, 518–534. [Google Scholar] [CrossRef]
- Of course, people can and do learn with assistance from teachers and others. However, unsupervised learning has been a focus of interest in developing the SP theory, since it is clear that much of our learning is done without assistance and because unsupervised learning raises some interesting issues and yields some useful insights, as outlined in Section 5.2.
- The source code for the models, with associated documents and files, may be downloaded via links under the heading “SOURCE CODE” at the bottom of the page on http://bit.ly/WtXa3g (accessed on 5 August 2013)..
- As in ordinary search engines and, indeed, in the brains of people and other animals, high levels of parallelism are needed to achieve speedy processing with large data sets (see also Section 4.3 and Section 5.1.5. ).
- Wolff, J.G. The SP theory of intelligence: Benefits and applications. 2013. in preparation. Available online: http://bit.ly/12YmQJW (accessed on 31 July 2013).
- Wolff, J.G. Application of the SP theory of intelligence to the understanding of natural vision and the development of computer vision. 2013. in preparation. Available online: http://bit.ly/Xj3nDY (accessed on 31 July 2013).
- Prince, S.J.D. Computer Vision: Models, Learning, and Inference; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Whether multiple alignments are shown with patterns in rows or in columns depends largely on what fits best on the page.
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 1991. [Google Scholar]
- Although this scheme is slightly less efficient than the well-known Huffman scheme, it has been adopted, because, unlike the Huffman scheme, it does not produce anomalous results when probabilities are derived from code sizes, as described in BK (Section 3.7).
- See, for example, “Sequence alignment”, Wikipedia. Available online: en.wikipedia.org/wiki/Sequence alignment ((accessed on 8 May 2013).
- Sankoff, D.; Kruskall, J.B. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparisons; Addison-Wesley: Reading, MA, USA, 1983. [Google Scholar]
- WinMerge, latest stable version 2.14.0; Open Source differencing and merging tool for Windows. Available online: http://winmerge.org (accessed on 31 July 2013).
- Wolff, J.G. Towards an intelligent database system founded on the SP theory of computing and cognition. Data Knowl. Eng. 2007, 60, 596–624. [Google Scholar] [CrossRef]
- Dorigo, M.; Gambardella, L.M. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1997, 1, 53–66. [Google Scholar] [CrossRef]
- Thus “computing as compression” does not imply that all redundancy is bad and should be removed. Redundancy in information is often useful in, for example, understanding speech in noisy conditions (cf., Section 4.2.2) or in backup copies for data.
- Wolff, J.G. Language acquisition, data compression and generalization. Lang. Commun. 1982, 2, 57–89. [Google Scholar] [CrossRef]
- In this and other examples in this subsection, we shall assume that letters are analogues of low-level perceptual features in speech, such as formant ratios or formant transitions.
- Wolff, J.G. The discovery of segments in natural language. Br. J. Psychol. 1977, 68, 97–106. [Google Scholar] [CrossRef]
- Wolff, J.G. Language acquisition and the discovery of phrase structure. Lang. Speech 1980, 23, 255–269. [Google Scholar] [PubMed]
- The Chomskian doctrine that children are born with a knowledge of “universal grammar” fails to account for the specifics of syntactic forms in different languages, and it depends on the still-unproven idea that there is something of substance that is shared by all the world’s languages.
- Relevant evidence comes from cases where children learn to understand language even though they have little or no ability to speak [70,71]—so that there is little or nothing for anyone to correct.
- If an error is not rare, it is likely to acquire the status of a dialect or idiolect variation and cease to be regarded as an error.
- Such as: learning in the kinds of artificial neural network that are popular in computer science; Hebb’s [66] concept of learning; Pavlovian learning; and Skinnerian learning.
- Turing, A.M. On computable numbers, with an application to the Entscheidungsproblem. Proc. Lond. Math. Soc. 1936, 42, 230–265. [Google Scholar]
- Turing, A.M. On computable numbers, with an application to the Entscheidungsproblem: a correction. Proc. Lond. Math. Soc. 1937, 43, 544–546. [Google Scholar] [CrossRef]
- Post, E.L. Formal reductions of the general combinatorial decision problem. Am. J. Math. 1943, 65, 197–268. [Google Scholar] [CrossRef]
- Chaitin, G.J. Randomness in arithmetic. Sci. Am. 1988, 259, 80–85. [Google Scholar] [CrossRef]
- Wolff, J.G. The SP Theory and the Representation and Processing of Knowledge. In Soft Computing in Ontologies and Semantic Web; Ma, Z., Ed.; Springer-Verlag: Heidelberg, Germany, 2006; pp. 79–101. [Google Scholar]
- Wolff, J.G. Medical diagnosis as pattern recognition in a framework of information compression by multiple alignment, unification and search. Decis. Support Syst. 2006, 42, 608–625. [Google Scholar] [CrossRef]
- Iwanska, L.; Zadrozny, W. Introduction to special issue on context in natural language processing. Comput. Intell. 1997, 13, 301–308. [Google Scholar] [CrossRef]
- Chomsky, N. Syntactic Structures; Mouton: The Hague, The Netherlands, 1957. [Google Scholar]
- Pereira, F.C.N.; Warren, D.H.D. Definite clause grammars for language analysis—a survey of the formalism and a comparison with augmented transition networks. Artif. Intell. 1980, 13, 231–278. [Google Scholar] [CrossRef]
- In this figure, the sentence, “it is wash ed”, could have been represented more elegantly as, “i t i s w a s h e d”, as in previous examples. The form shown here has been adopted, because it helps to stop multiple alignments growing too large. Likewise, with Figure 14.
- Oliva, A.; Torralba, A. The role of context in object recognition. Trends Cogn. Sci. 2007, 11, 520–527. [Google Scholar] [CrossRef] [PubMed]
- Although the term “heterarchy” is not widely used, it can be useful as a means of referring to hierarchies in which, as in the example in the text, a given node may appear in two or more higherlevel nodes that are not themselves hierarchically related. In the SP framework, there may be heterarchies in both class-inclusion structures and part-whole structures. However, to avoid the clumsy expression “hierarchy or heterarchy”, the term “hierarchy” is used in most parts of this article as a shorthand for both concepts.
- Pothos, E.M.; Wolff, J.G. The Simplicity and Power model for inductive inference. Artif. Intell. Rev. 2006, 26, 211–225. [Google Scholar] [CrossRef]
- Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, revised second printing ed.; Morgan Kaufmann: San Francisco, CA, USA, 1997. [Google Scholar]
- Likewise, a travel booking clerk using a database of all flights between cities will assume that, if no flight is shown between, say, Edinburgh and Paris, then no such flight exists. In systems like Prolog, the closed-world assumption is the basis of “negation as failure”: If a proposition cannot be proven with the clauses provided in a Prolog program, then, in terms of that store of knowledge, the proposition is assumed to be false.
- Some of the frequencies shown in Figure 22 are intended to reflect the two probabilities suggested for this example in [59] (p. 49): “... the [alarm] is sensitive to earthquakes and can be accidentally (p = 0:20) triggered by one. ... if an earthquake had occurred, it surely (p = 0:40) would be on the [radio] news.”
- Evans, T.G. A program for the solution of a class of geometric-analogy intelligence-test questions. In Semantic Information Processing; Minsky, M.L., Ed.; MIT Press: Cambridge, MA, USA, 1968; pp. 271–353. [Google Scholar]
- Belloti, T.; Gammerman, A. Experiments in solving analogy problems using Minimal Length Encoding. Appl. Decis. Technol. 1996, 95, 209–220. [Google Scholar]
- Gammerman, A.J. The representation and manipulation of the algorithmic probability measure for problem solving. Ann. Math. Artif. Intell. 1991, 4, 281–300. [Google Scholar] [CrossRef]
- Pothos, E.M.; Busemeyer, J.R. Can quantum probability provide a new direction for cognitive modeling? Behav. Brain Sci. 2013, 36, 255–327. [Google Scholar] [CrossRef] [PubMed]
- Hebb, D.O. The Organization of Behaviour; John Wiley & Sons: New York, NY, USA, 1949. [Google Scholar]
- See, for example, “Artificial neural network”. Wikipedia. Available online: http://winmerge.org (accessed on 31 July 2013).
- Zipf, G.K. Human Behaviour and the Principle of Least Effort; Hafner: New York, NY, USA, 1949. [Google Scholar]
- Barlow, H.B. Redundancy reduction revisited. Netw. Comput. Neural Syst. 2001, 12, 241–253. [Google Scholar] [CrossRef]
- Lenneberg, E.H. Understanding language without the ability to speak. J. Abnorm. Soc. Psychol. 1962, 65, 419–425. [Google Scholar] [CrossRef] [PubMed]
- Brown, R. A First Language: The Early Stages; Penguin: Harmondsworth, UK, 1973. [Google Scholar]
© 2013 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Wolff, J.G. The SP Theory of Intelligence: An Overview. Information 2013, 4, 283-341. https://doi.org/10.3390/info4030283
Wolff JG. The SP Theory of Intelligence: An Overview. Information. 2013; 4(3):283-341. https://doi.org/10.3390/info4030283
Chicago/Turabian StyleWolff, J Gerard. 2013. "The SP Theory of Intelligence: An Overview" Information 4, no. 3: 283-341. https://doi.org/10.3390/info4030283
APA StyleWolff, J. G. (2013). The SP Theory of Intelligence: An Overview. Information, 4(3), 283-341. https://doi.org/10.3390/info4030283