Intelligent thinking

What do we think with?

What happens inside our brains when we learn? We do know that the grey matter inside our head is an incredibly complex maze of nerve cells. It’s estimated that we have at least thirty billion (30,000,000,000) nerve cells (neurons) in our cerebellar cortex, that thin wrinkled covering of the brain where we do our thinking.  Each nerve cell (neuron) consists of multiple incoming branches (dendrites) that receive information, and multiple outgoing branches that transmit a signal to another neuron. The signal switches between an electrical signal in the neurons – there’s enough electricity in your brain to light up a 20 watt light bulb – and a chemical signal in the tiny gaps (synapses) between neurons. Each neuron has several thousand or more connections to other neurons (Hawkins, Ahmad, & Dubinsky, 2011).  Anyway you look at it it’s a lot of scrambled spaghetti.

With all those electrical circuits it’s very tempting to think of the brain as a sophisticated parallel processing multitasking computer, except that  somehow that neuronal spaghetti can recognise images and understand spoken language, and do things that so far computers find really hard to do, such as catching a ball. Does a computer ever “know” that it’s forgotten something?



Neuron  (Image from http://www.biologymad.com/nervoussystem/nervoussystemintro.htm)

Jeff Hawkins believes that  the way to make sense of the complexity of the human brain is to develop artificial intelligence that closely mimics what we know about how the brain works (George & Hawkins, 2009). He’s the first to admit that it’s an ambitious project, but his track record with the development of the hugely successful Palm Pilots that were able to recognise handwriting adds some credibility to the venture.

He starts from the observation, originally made by Mountcastle in 1978, that although different mental activities (processing speech, making mathematics calculations etc.) occur in different parts of the cortex, there is no significant specialization of the structure of the cortex from one region to another. The difference between one area of the cortex and another is primarily that it receives different types of raw data to process. The actual processing of that data is done the much the same way in all areas of the cortex, be it processing vision, touch, sound etc.

How then does the brain do what it does?  What and how is data being processed? Many now believe that the primary function of any part of the cortex is to recognise patterns and that the cortex constantly makes predictions based on those patterns. (Friston, 2011; Hawkins, 2004)

What is intelligence?

In a nutshell, according to Hawkins, intelligence (for both man and machine) is the ability to recognise patterns, and to make predictions from those patterns (Hawkins, 2004). Most of those predictions happen at an unconscious level, but when a prediction is made and turns out not to be correct then we notice it.  Certainly current research does seem to confirm that the brain is probably continually making predictions, including research on autism and schizophrenia  (Friston, 2011).

At present, while I am writing this, I am constantly making unconscious predictions. The keyboard keys feel just the way they felt before; the sky is still overcast because it is raining heavily. If a key were suddenly to jam, or not be there, I would become conscious of it and notice it. If the sky suddenly became clear blue sky again, I would notice it. (Hey what happened to the cloud? It’s still raining and there’s a clear sky! Weird!)  As I type, I make predictions about what letter I need to type next, based on previous patterns I have learned. I’m not very good at touch typing, so I don’t have good pattern recognition of where each keyboard letter is located. This inability to make accurate predictions about where keyboard letters are when I type means that I am more likely to make mistakes because I’m unable to predict that I’m about to make a mistake.

So how is it possible to get from a definition of intelligence as the ability to remember and predict patterns to rocket science type intelligence?  Surely the conceptual understanding required to solve an algebra problem involves more than predicting what the answer will be, especially when you consider that by definition, a problem is something you don’t know the answer to.  Where do the patterns come from and how can you recognise patterns if you don’t know what pattern you’re looking for? How are those patterns learned? Doesn’t that learning of patterns mean that there’s already some intelligence there to do the learning?

Hawkins’ insight is that the teacher is time. The brain makes the assumption that things that happen close in time are probably connected.  The statue that we are looking at from a slightly different angle as we move our head is still the same statue, so our brain connects the new pattern that we are seeing with the earlier pattern. We recognise similarities over time, and eventually over time those similarities become a library of recognised patterns. The patterns can be grouped together to form more complex patterns, which becomes part of a hierarchy of patterns that eventually might become the knowledge that we use to identify an animal as a dog, be it big or small, close or far away, sitting or running. Similar hierarchies let us recognise a shape on a page as a letter, regardless of font, group the letters into words, and make sense of what we are reading. (See “Everyone knows you’re a dog” example)

So intelligence is the ability to remember patterns and make predictions, and the cortex learns by identifying patterns based on the frequency with they occur. Neuronal circuits are also continually getting feedback about the accuracy of the predictions, and this feedback is used to continually update the information in the circuit so that future predictions will be more accurate.  This means that we are continually simultaneously predicting and learning. At this stage it’s only a theory, but it’s elegant in its simplicity and it seems to stand up to scrutiny.  It’s even possible to show that there are mathematical reasons for short-term memory being limited to Miller’s magic number of five plus or minus two (Linhares, Chada, & Aranha, 2011).  It’s  the way the tapestry of detail meshes with what we do know about the brain that makes the theory enticing enough to suggest it may be possible to decipher the spaghetti muddle.

From algorithm to concept

Whenever we notice a pattern, we automatically and unconsciously make predictions based on that pattern. Most of the predictions are mundane and don’t require our conscious attention, but we are alerted to anything that is unexpected (the phone rings) or requires our attention (catching a ball). Not only do we continually make predictions, but these predictions are automatically based on an estimated probability that the prediction is correct.  The closer the probability of a prediction being correct is to 100 per cent the better we know something.  If we can’t make a prediction at all, then we don’t know something.  If there are two possible predictions, our brain automatically selects the prediction with the highest probability of being right, i.e. the most frequently occurring pattern. The amazing part about the brain is that the unconscious process of recognising a pattern and making the most likely prediction based on that pattern all happens very quickly. So quickly that even while you are reading this, you will probably anticipate what word will be at the end of this……………

sentence.

It seems that the processing that the brain is continually doing consists of three basic steps

1. Searching for a pattern

2. Making a prediction based on that pattern

3. Checking whether the prediction is correct.

A traditional computer program to do this would consist of numerous lines of code, and would probably have a number of patterns or rules supplied by the programmer. However the brain learns the patterns it needs without any external tampering with its neuronal circuitry, and can identify a pattern, make a prediction, and check the results of a prediction in less than one hundred steps.  We know it must be less than one hundred steps because transmitting a nerve signal from one neuron to another is relatively slow compared to signal transmission in a computer. In the brain, an electrical signal from a neuron triggers the release of a chemical to carry the signal across the gap (synapse) between neurons, and this chemical signal is then converted back into an electrical signal in the next neuron. It would take half a second to transmit a signal along a path of one hundred neurons (Hawkins, 2004), so clearly the algorithms used by the brain to make predictions are very efficient.

That efficiency comes from the way the patterns are organised. The main features of these patterns are that

1. Patterns are stored in sequences, so one part of a pattern lets us recall the next part of a pattern (recalling the previous part of the pattern is much harder – what letter comes before X?)

2. Patterns are stored in a hierarchy.  Smaller patterns are grouped into larger patterns, and those larger patterns are then grouped into larger patterns. An obvious but coarse example is reading, where parts of a shape are grouped to form a letter, letters are grouped into words, and words are associated with a meaning which may depend on the other words around it. The sentence “She had a tear in her dress and a tear in her eye” has similar spelling for “tear” but two different pronunciations and meanings based on the surrounding words.

3. Patterns are able to trigger memories of themselves, enabling us to supply missing inf_rm_t__n.   We can recognise a tune even if we start hearing the tune in the middle of the tune. If we are reasonably familiar with the tune we can hum or sing along, predicting the next notes in the sequence. (However we often have to get to the end of the melody before we can recall the beginning notes of the tune, because the notes are remembered in sequence).

4. Patterns are stored in an invariant form. This means we can recognise that an animal is a dog regardless of species, or whether it is a live dog, a statue or a photo of a dog, and it doesn’t matter whether the dog is facing us, side on, sitting down or standing up. Predictions are based on these invariant forms. From the invariant pattern for a dog we know that dogs can bark. The invariant form for a photo is that it is a static image, so a photo of a dog will not bark. We recognise a tune regardless of whether it is played on a single instrument, played by an orchestra, or hummed. It doesn’t matter if it is played in a different key, or a slightly different tempo, we still recognise the tune. Our brains automatically focus on the most relevant information, and ignore unnecessary details.

So the brain is able to recognise information based on previous patterns it has encountered, can use those patterns to provide extra information to the current situation, match the relevant information to a more generalized form of information, and use that information to make predictions. A real life example might be as follows. You hear a sound, and recognise the sound is a dog barking close by (pattern recognition enabling you to progress through a pattern hierarchy to associate the sound with a dog).  You also recognise that the sound of the bark more closely matches the sound of a large dog barking, so you know it is likely to be a large dog (providing missing information), and  you know that most dogs are very friendly, but sometimes dogs bite (from your invariant form for dogs).  You might combine this information with information you have just gained by interpreting another hierarchy of patterns that allows you to read and understand a nearby sign saying “Guard dog on premises.”  to reach the sensible conclusion that climbing over the fence to retrieve a ball might not be a good idea.

So a concept is a micro-circuit in our brain cells, created by recognising patterns which are then generalised into an invariant form.  That micro-circuit continually makes predictions which become more accurate over time as the microcircuit modifies itself (learns) in response to feedback about the accuracy of its prediction.

There’s a concept for you to think about!


Related articles:
Why can’t a computer be more like a brain: Jeff Hawkins

Hierarchical Temporal Memory: Wikipedia

References:

Friston, K. (2011). Prediction, perception and agency. International journal of psychophysiology : official journal of the International Organization of Psychophysiology. Elsevier B.V. doi:10.1016/j.ijpsycho.2011.11.014

George, D., & Hawkins, J. (2009). Towards a mathematical theory of cortical micro-circuits. PLoS computational biology, 5(10), e1000532. doi:10.1371/journal.pcbi.1000532

Hawkins, J. (2004). On Intelligence. New York, N.Y.: St. Martin’S Griffin.

Hawkins, J., Ahmad, S., & Dubinsky, D. (2011). Hierarchical Temporal Memory including Cortical Learning Algorithms. Retrieved from http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdf

Linhares, A., Chada, D. M., & Aranha, C. N. (2011). The emergence of Miller’s magic number on a sparse distributed memory. PloS one, 6(1), e15592. doi:10.1371/journal.pone.0015592

Scratch programming fun

While it’s often assumed that programming is difficult to learn,  many  8 to 14 year olds are now learning programming skills using a specially designed programming tool  developed at MIT  called Scratch.  Scratch is a free program that runs on Mac, Windows and Linux. First released in 2007, it now has an extensive following of students around the world. The developers saw Scratch not just as a programming tool, but an opportunity for students to explore and be creative with computers. Although today’s digital natives can ‘read’ computers, until now most wouldn’t have the first idea about ‘writing’ for computers – i.e. programming.  That might change as more students discover Scratch.

Learning to program has so much going for it that it’s probably only a matter of time before it becomes regarded as an essential part of learning for children of all abilities. Not only does it develop logical reasoning and problem solving skills, but programming can creative and challenging at a number of ability levels. Since programming lets students create projects connected to their own particular interests, projects are more likely to be ones that students find relevant and more meaningful,  and so more likely to have the motivation ingredients found in a self-directed learning activity. Perhaps best of all, programming fosters a healthy attitude to mistakes and setbacks. Analysis of results by review and reflection are fundamental  programming skills. Last but not least, programming can also be a lot of fun.
Read more

Why don’t students like school?

The title of this excellent book is perhaps a little misleading – there’s not much in it about why students don’t like school; it’s actually a concise list of nine principles about how the brain learns that can be applied in the classroom by cognitive scientist Daniel Willingham.  The result is a practical and easily readable introduction into research based cognitive psychology, along with practical suggestions for applying the theory to the classroom.

In a nutshell, here are Willingham’s nine key points:

1. People are naturally curious, but not naturally good thinkers: unless the cognitive conditions are right, we will avoid thinking.

We enjoy mental activity and solving problems bring pleasure, but only when the problem is appropriately challenging – not too simple, and not so difficult that it is frustrating.  Appropriate levels of difficulty will engage students provided they have access to enough information to solve the problem.  Cognitive conflict is a great way to stimulate thinking. (If 1/2 plus 1/4 really does = 2/6 (a pretty common assumption amongst those who are struggling with fractions), then why is the answer (2/6) smaller than 1/2  ? ). It’s also good developing a good metacognitive skill – self check that the answer makes sense.
Read more

Monster problem for working memory

The following problem1 demonstrates the impact of working memory limitations on processing information. Although there is no difficult conceptual thinking required to solve the problem, it has been reported that most university students require about 30 minutes to find the solution.


Three monsters, one small, one medium and one large, were each holding a globe. The globe came in three sizes only – small, medium and large, and each globe could be expanded or shrunk repeatedly to any one of these sizes but to no other size.

The small monster was holding the medium globe, the medium monster was holding the large globe, and the large monster was holding the small globe. They could change the size of the globes according to the following rules:

  1. Only one globe could be changed at a time.
  2. When two globes are the same size, only the globe held by the larger monster may be changed.
  3. A globe must not be changed to the same size as the globe of a larger monster.

What sequence of changes would allow the monsters to hold globes proportional to their size?


Although each globe changing rule is easy enough to understand, the problem is difficult because it is difficult to hold the rules in working memory. The original creators of the problem (Kotovsky, Hayes and Simon – 1985) found memorizing the rules of the problem to the point where they could be repeated effortlessly made the problem much easier to solve.

While it is easy to assume that a student can’t solve a problem because he/she doesn’t understand what needs to be done, the monster problem indicates that working memory limitations are just as likely to be the cause of the difficulty.

1.  from  “Instructional design for technology” by John Sweller (1999).

see A Deep Understanding of Memory

A deep understanding of memory

In his book “Why Don’t Children Like School”,  psychologist Daniel Willingham says that understanding is memory in disguise.  Although this seems the exact opposite of  the widely adopted strategy of making information memorable by making it understandable,   his point is that memory and understanding are a pidgeon pair. Both are necessary for learning – memory improves as understanding improves, and understanding improves as memorizing improves.  Any teaching strategy that neglects the role that memory plays in understanding is likely to be one where the students find conceptual understanding of the topic elusive.

Yup – I’m saying sometimes memorizing is an essential part of understanding,   and sometimes it needs to come before you can understand enough to learn.

Yup – I know that’s not what they teach in teaching college,  but then they don’t teach much cognitive science in teaching college either.  Sure, plenty of Piaget, Vygostkty, Bruner and Gardner, but only a smattering of neurones and synapses.

Here it is – the crash course in cognitive science – a.k.a.  “Your Memory, and Why It’s Important to Know More About It”.

Message understood doesn’t always mean message is remembered.

The brain apparently handles understanding (processing information) and memorizing (storing information) in totally different ways.   Although (fortunately) it doesn’t happen very often,  it is possible to have brain damage which makes it impossible to create new memories. Such an individual is able to reason and understand using any knowledge from memories acquired prior to the injury, but is unable to create new memories. Any newly acquired knowledge obtained from logical reasoning and understanding of already known information will not be remembered for more than a few minutes.

The long and the short of memories

Most theories about how brains think, reason, calculate and memorize involve the concept of two types of memory – working memory (sometimes referred to as short-term memory) and long-term memory. Although not yet fully understood, current theories  about the interaction of these two memory types can help in the creation and design of more effective learning experiences for students, particularly those students with learning difficulties.
Read more

Hooked on maths

Challenge: Is it possible to find a student who would not be engaged by  Vi Hart’s brilliant  math doodling videos.?  These are about having fun while  discovering mathematical relationships.  Here’s a sample – there’s more on her site.

Doodling in Math: Sick examples


There’s plenty other mathematical stuff  on Vi Hart’s  blog.   Stuff with balloons,and how to make platonic solids out of fruit.  This is mathematics that meets Paul Lockhart’s1 description of the way mathematics needs to be taught – using playful “serendipitous exploration”  to discover that mathematics is about weaving ideas into patterns.

1. Paul Lockhart is author of  the book “A Mathematician’s Lament”   -  introduction is online

Copyright © Computer Chalk