If we could develop artificial intelligence that mimics how our brains seems to work, would we understand the brain better? Jeff Hawkins is passionate in his conviction that Artificial Intelligence will help us understand the brain, and has developed a model for how the brain works which he has called Hierarchical Temporal Memory (HTM). Although the name is a little overwhelming, the essentials of his framework seem to work.
While many of us probably instinctively feel that there is an innate unnknowable something that makes us intelligent, Hawkins considers that it is possible to have a much simpler definition of intelligence. He defines intelligence (for both man and machine) as the ability to recognise patterns, and to make predictions from those patterns (Hawkins, 2004). Most of those predictions happen at an unconscious level, but when a prediction is made (the ball will bounce when it hits the ground) and is wrong then we notice it (the ball didn’t bounce – what’s happening). Certainly current research does seem to confirm that the brain is probably continually making predictions, including research on autism and schizophrenia (Friston, 2011).
So how is it possible to get from a definition of intelligence as the ability to remember and predict patterns to rocket science type intelligence? Surely the conceptual understanding required to solve an algebra problem involves more than predicting what the answer will be, especially when you consider that by definition, a problem is something you don’t know the answer to. Where do the patterns come from and how can you recognise patterns if you don’t know what pattern you’re looking for? How are those patterns learned? Doesn’t that learning of patterns mean that there’s already some intelligence there to do the learning?
Hawkins’ insight is that the teacher is time (the temporal part of hierarchical temporal memory). The brain makes the assumption that things that happen close in time are probably connected. The statue that we are looking at from a slightly different angle as we move our head is still the same statue, so our brain connects the new pattern that we are seeing with the earlier pattern. We recognise similarities over time, and eventually those similarities become part of a library of recognised patterns. The patterns are grouped in a hierarchy from small, detailed and quickly changing patterns at the bottom of the hierarchy, to more generalized, slowly changing patterns at the top of the hierarchy. The ability to connect and generalize patterns is something that traditional AI programs find particularly difficult. It allows us to do things like identify an animal as a dog, be it big or small, close or far away, sitting or running. Similar hierarchies let us recognise a shape on a page as a letter, regardless of its size or font, group the letters into words, words into sentences and so make sense of what we are reading. (See “Everyone knows you’re a dog” example)
With HTM theory, intelligence is the ability to make predictions from remembered patterns, and learning occurs as the cortex compares its prediction with what actually is present or occurring, using the comparison to learn by adjusting the recalled pattern. Neuronal circuits have more feedback connections than feed forward connections, and it is assumed that this feedback is used to continually update the information in the circuit so that future predictions will be more accurate. This means that we are continually simultaneously predicting and learning. At this stage it’s only a theory, but it’s elegant in its simplicity and it seems to stand up to scrutiny. It’s even possible to show that there are mathematical reasons for short-term memory being limited to Miller’s magic number of five plus or minus two (Linhares, Chada, & Aranha, 2011). It’s the way the tapestry of detail meshes with what we do know about the brain that makes the theory enticing enough to suggest it may just be possible to unravel the processes going on in the spaghetti muddle of somewhere between 30,000,000,000 to 100,000,000,000 interconnected neurons inside the human brain.
From algorithm to concept
Although we are continually noticing previously encountered patterns and making predictions from them, this process is usually unconscious. Most of the predictions are mundane and don’t require our conscious attention. We notice what we are deliberately paying attention to (the ball we are trying to catch) or something that hasn’t been predicted (the phone rings), but most of the time we don’t notice the feel of the clothes on our skin (it feels exactly like we unconsciously predicted it would feel like). Not only do we continually make predictions, but these predictions are automatically based on an estimated probability that the prediction is correct. If there are two or more possible predictions, our brain automatically selects the prediction with the highest probability of being right, i.e. the pattern that best matches the circumstances. The amazing part about the brain is that the unconscious process of recognising a pattern and making the most likely prediction based on that pattern all happens very quickly. So quickly that even while you are reading this, you will probably anticipate what word will be at the end of this……………
To summarize so far, HTM theory assumes that the continual information processing in the brain includes three basic steps
1. Searching for a pattern.
2. Whenever possible, making a prediction based on that pattern.
3. Checking whether the prediction is correct.
A traditional computer program to do this would consist of numerous lines of code, and would probably have a number of patterns or rules supplied by the programmer. However the brain learns the patterns it needs without any external tampering with its neuronal circuitry, can identify a pattern, make a prediction, and check the results of a prediction in less than one hundred steps. We know it must be less than one hundred steps because transmitting a nerve signal from one neuron to another is relatively slow compared to signal transmission in a computer. In the brain, an electrical signal from a neuron triggers the release of chemicals to carry the signal across the gap (synapse) between connecting neurons, and this chemical signal is then converted back into an electrical signal in the next neuron. It would take half a second to transmit a signal along a path of one hundred neurons (Hawkins, 2004). Most of our predictions are made in less than half a second (e.g. the falling glass will break when/if it hits the floor, so try to catch it before it hits the floor) so clearly the algorithms used by the brain to make predictions are very efficient and involve less than 100 neuronal steps.
That efficiency comes from the way the patterns are organised. The main features of these patterns are that:-
1. Patterns are stored in sequences, so one part of a pattern lets us recall the next part of a pattern (recalling the previous part of the pattern is much harder – what letter comes before X?)
2. Patterns are stored in a hierarchy. Smaller patterns are grouped into larger patterns, and those larger patterns are then grouped into larger patterns. Larger patterns are more likely to be generalized and involve the context in which the pattern occurs. An obvious but coarse example is reading, where parts of a shape are grouped to form a letter, letters are grouped into words, and words are associated with a meaning which may depend on the other words around it. The sentence “She had a tear in her eye when she saw the tear in her dress” has two different pronunciations and meanings for “tear” based on the context supplied by the surrounding words.
3. Patterns are able to trigger memories of themselves, enabling us to supply missing inf_rm_t__n. We can recognise a tune even if we start hearing the tune in the middle of the tune. If we are reasonably familiar with the tune we can hum or sing along, predicting the next notes in the sequence. (However we often have to get to the end of the melody before we can recall the beginning notes of the tune, because the notes are remembered in sequence). Humming a tune backwards is particularly challenging, as we usually don’t have a pattern for the reversed version of the tune.
4. Patterns are stored in an invariant form. This means we can recognise that the animal shape we might see is a dog regardless of species, or whether it is a live dog, a statue or a photo of a dog, and it doesn’t matter whether the dog is facing us, side on, sitting still or running. Predictions are based on these invariant forms. From the invariant pattern for a dog we know that dogs can bark. The invariant form for a photo is that it is a static image, so a photo of a dog will not bark. We recognise a tune regardless of whether it is played on a single instrument, played by an orchestra, or hummed. It doesn’t matter if it is played in a different key, or a slightly different tempo, we still recognise the tune. Our brain automatically focuses on the most relevant information, ignores superfluous details, and uses the matched information pattern to supply additional information in the form of inferences and/or predictions.
A real life example might be as follows. You hear a sound, and recognise the sound as a dog barking close by (pattern recognition enabling you associate barking with the sound made by a living dog, and to recall the general features of a dog). You also recognise that the sound of the bark more closely matches the sound usually made by a large dog, so you know it is likely to be a large dog (using pattern prediction and inference to provide missing information about the size of the dog), and you know from your generalized invariant pattern for dogs that most dogs are very friendly, but sometimes dogs bite. You might combine this information with information you have just gained by interpreting another hierarchy of patterns that allows you to read and understand a nearby sign saying “Guard dog on premises.” to reach the sensible conclusion that climbing over the fence to retrieve a ball might not be a good idea.
So a concept is a micro-circuit in our brain cells, created by recognising patterns, generalising them into an invariant form and organizing them into a hierarchy from multiple details to summarized overview. That micro-circuit continually makes predictions which become more accurate over time as the microcircuit modifies itself (learns) in by adapting in response to feedback about the accuracy of its prediction.
There’s a concept for you to think about!
Why can’t a computer be more like a brain: Jeff Hawkins
Friston, K. (2011). Prediction, perception and agency. International journal of psychophysiology : official journal of the International Organization of Psychophysiology. Elsevier B.V. doi:10.1016/j.ijpsycho.2011.11.014
George, D., & Hawkins, J. (2009). Towards a mathematical theory of cortical micro-circuits. PLoS computational biology, 5(10), e1000532. doi:10.1371/journal.pcbi.1000532
Hawkins, J. (2004). On Intelligence. New York, N.Y.: St. Martin’S Griffin.
Hawkins, J., Ahmad, S., & Dubinsky, D. (2011). Hierarchical Temporal Memory including Cortical Learning Algorithms. Retrieved from http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdf
Linhares, A., Chada, D. M., & Aranha, C. N. (2011). The emergence of Miller’s magic number on a sparse distributed memory. PloS one, 6(1), e15592. doi:10.1371/journal.pone.0015592