If we could develop artificial intelligence that mimics how our brains seems to work, would we understand the brain better? Jeff Hawkins is passionate in his conviction that AI will help us understand the brain, and has developed a model for how the brain works which he has called Hierarchical Temporal Memory. Although the name is a little overwhelming, the essentials of his framework seem to work.
While many of us probably instinctively feel that there is an innate unnknowable something that makes us intelligent, Hawkins considers that it is possible to have a much simpler definition of intelligence. For Hawkins intelligence (for both man and machine) as the ability to recognise patterns, and to make predictions from those patterns (Hawkins, 2004). Most of those predictions happen at an unconscious level, but when a prediction is made and turns out not to be correct then we notice it. Certainly current research does seem to confirm that the brain is probably continually making predictions, including research on autism and schizophrenia (Friston, 2011).
So how is it possible to get from a definition of intelligence as the ability to remember and predict patterns to rocket science type intelligence? Surely the conceptual understanding required to solve an algebra problem involves more than predicting what the answer will be, especially when you consider that by definition, a problem is something you don’t know the answer to. Where do the patterns come from and how can you recognise patterns if you don’t know what pattern you’re looking for? How are those patterns learned? Doesn’t that learning of patterns mean that there’s already some intelligence there to do the learning?
Hawkins’ insight is that the teacher is time (the temporal part of hierarchical temporal memory). The brain makes the assumption that things that happen close in time are probably connected. The statue that we are looking at from a slightly different angle as we move our head is still the same statue, so our brain connects the new pattern that we are seeing with the earlier pattern. We recognise similarities over time, and eventually over time those similarities become a library of recognised patterns. The patterns can be grouped together to form more complex patterns, which becomes part of a hierarchy of patterns that eventually might become the knowledge that we use to identify an animal as a dog, be it big or small, close or far away, sitting or running. Similar hierarchies let us recognise a shape on a page as a letter, regardless of font, group the letters into words, and make sense of what we are reading. (See “Everyone knows you’re a dog” example)
So intelligence is the ability to remember patterns and make predictions, and the cortex learns by identifying patterns based on the frequency with they occur. Neuronal circuits are also continually getting feedback about the accuracy of the predictions, and this feedback is used to continually update the information in the circuit so that future predictions will be more accurate. This means that we are continually simultaneously predicting and learning. At this stage it’s only a theory, but it’s elegant in its simplicity and it seems to stand up to scrutiny. It’s even possible to show that there are mathematical reasons for short-term memory being limited to Miller’s magic number of five plus or minus two (Linhares, Chada, & Aranha, 2011). It’s the way the tapestry of detail meshes with what we do know about the brain that makes the theory enticing enough to suggest it may be possible to decipher the spaghetti muddle of somewhere between 30,000,000,000 and 100,000,000,000 interconnected neurons.
From algorithm to concept
Whenever we notice a pattern, we automatically and unconsciously make predictions based on that pattern. Most of the predictions are mundane and don’t require our conscious attention, but we are alerted to anything that is unexpected (the phone rings) or requires our attention (catching a ball). Not only do we continually make predictions, but these predictions are automatically based on an estimated probability that the prediction is correct. The closer the probability of a prediction being correct is to 100 per cent the better we know something. If we can’t make a prediction at all, then we don’t know something. If there are two possible predictions, our brain automatically selects the prediction with the highest probability of being right, i.e. the most frequently occurring pattern. The amazing part about the brain is that the unconscious process of recognising a pattern and making the most likely prediction based on that pattern all happens very quickly. So quickly that even while you are reading this, you will probably anticipate what word will be at the end of this……………
sentence.
It seems that the processing that the brain is continually doing consists of three basic steps
1. Searching for a pattern
2. Making a prediction based on that pattern
3. Checking whether the prediction is correct.
A traditional computer program to do this would consist of numerous lines of code, and would probably have a number of patterns or rules supplied by the programmer. However the brain learns the patterns it needs without any external tampering with its neuronal circuitry, and can identify a pattern, make a prediction, and check the results of a prediction in less than one hundred steps. We know it must be less than one hundred steps because transmitting a nerve signal from one neuron to another is relatively slow compared to signal transmission in a computer. In the brain, an electrical signal from a neuron triggers the release of a chemical to carry the signal across the gap (synapse) between neurons, and this chemical signal is then converted back into an electrical signal in the next neuron. It would take half a second to transmit a signal along a path of one hundred neurons (Hawkins, 2004), so clearly the algorithms used by the brain to make predictions are very efficient.
That efficiency comes from the way the patterns are organised. The main features of these patterns are that
1. Patterns are stored in sequences, so one part of a pattern lets us recall the next part of a pattern (recalling the previous part of the pattern is much harder – what letter comes before X?)
2. Patterns are stored in a hierarchy. Smaller patterns are grouped into larger patterns, and those larger patterns are then grouped into larger patterns. An obvious but coarse example is reading, where parts of a shape are grouped to form a letter, letters are grouped into words, and words are associated with a meaning which may depend on the other words around it. The sentence “She had a tear in her dress and a tear in her eye” has similar spelling for “tear” but two different pronunciations and meanings based on the surrounding words.
3. Patterns are able to trigger memories of themselves, enabling us to supply missing inf_rm_t__n. We can recognise a tune even if we start hearing the tune in the middle of the tune. If we are reasonably familiar with the tune we can hum or sing along, predicting the next notes in the sequence. (However we often have to get to the end of the melody before we can recall the beginning notes of the tune, because the notes are remembered in sequence).
4. Patterns are stored in an invariant form. This means we can recognise that an animal is a dog regardless of species, or whether it is a live dog, a statue or a photo of a dog, and it doesn’t matter whether the dog is facing us, side on, sitting down or standing up. Predictions are based on these invariant forms. From the invariant pattern for a dog we know that dogs can bark. The invariant form for a photo is that it is a static image, so a photo of a dog will not bark. We recognise a tune regardless of whether it is played on a single instrument, played by an orchestra, or hummed. It doesn’t matter if it is played in a different key, or a slightly different tempo, we still recognise the tune. Our brains automatically focus on the most relevant information, and ignore unnecessary details.
It seems that the brain is able to recognise information based on previous patterns it has encountered, and can use those patterns to provide extra information to the current situation, match the relevant information to a more generalized form of information, and use that information to make predictions. A real life example might be as follows. You hear a sound, and recognise the sound is a dog barking close by (pattern recognition enabling you to progress through a pattern hierarchy to associate the sound with a dog). You also recognise that the sound of the bark more closely matches the sound of a large dog barking, so you know it is likely to be a large dog (providing missing information), and you know that most dogs are very friendly, but sometimes dogs bite (from your invariant form for dogs). You might combine this information with information you have just gained by interpreting another hierarchy of patterns that allows you to read and understand a nearby sign saying “Guard dog on premises.” to reach the sensible conclusion that climbing over the fence to retrieve a ball might not be a good idea.
So a concept is a micro-circuit in our brain cells, created by recognising patterns which are then generalised into an invariant form. That micro-circuit continually makes predictions which become more accurate over time as the microcircuit modifies itself (learns) in response to feedback about the accuracy of its prediction.
There’s a concept for you to think about!
Related articles:
Why can’t a computer be more like a brain: Jeff Hawkins
Hierarchical Temporal Memory: Wikipedia
References:
Friston, K. (2011). Prediction, perception and agency. International journal of psychophysiology : official journal of the International Organization of Psychophysiology. Elsevier B.V. doi:10.1016/j.ijpsycho.2011.11.014
George, D., & Hawkins, J. (2009). Towards a mathematical theory of cortical micro-circuits. PLoS computational biology, 5(10), e1000532. doi:10.1371/journal.pcbi.1000532
Hawkins, J. (2004). On Intelligence. New York, N.Y.: St. Martin’S Griffin.
Hawkins, J., Ahmad, S., & Dubinsky, D. (2011). Hierarchical Temporal Memory including Cortical Learning Algorithms. Retrieved from http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdf
Linhares, A., Chada, D. M., & Aranha, C. N. (2011). The emergence of Miller’s magic number on a sparse distributed memory. PloS one, 6(1), e15592. doi:10.1371/journal.pone.0015592