A deep understanding of memory

In his book “Why Don’t Children Like School”,  psychologist Daniel Willingham says that understanding is memory in disguise.  Although this seems the exact opposite of  the widely adopted strategy of making information memorable by making it understandable,   his point is that memory and understanding are a pidgeon pair. Both are necessary for learning – memory improves as understanding improves, and understanding improves as memorizing improves.  Any teaching strategy that neglects the role that memory plays in understanding is likely to be one where the students find conceptual understanding of the topic elusive.

Yup – I’m saying sometimes memorizing is an essential part of understanding,   and sometimes it needs to come before you can understand enough to learn.

Yup – I know that’s not what they teach in teaching college,  but then they don’t teach much cognitive science in teaching college either.  Sure, plenty of Piaget, Vygostkty, Bruner and Gardner, but only a smattering of neurones and synapses.

Here it is – the crash course in cognitive science – a.k.a.  “Your Memory, and Why It’s Important to Know More About It”.

Message understood doesn’t always mean message is remembered.

The brain apparently handles understanding (processing information) and memorizing (storing information) in totally different ways.   Although (fortunately) it doesn’t happen very often,  it is possible to have brain damage which makes it impossible to create new memories. Such an individual is able to reason and understand using any knowledge from memories acquired prior to the injury, but is unable to create new memories. Any newly acquired knowledge obtained from logical reasoning and understanding of already known information will not be remembered for more than a few minutes.

The long and the short of memories

Most theories about how brains think, reason, calculate and memorize involve the concept of two types of memory – working memory (sometimes referred to as short-term memory) and long-term memory. Although not yet fully understood, current theories  about the interaction of these two memory types can help in the creation and design of more effective learning experiences for students, particularly those students with learning difficulties.

Long-term memory is what we normally think of when we think about memory. Long term memory can store a vast amount of information, from short periods of time to decades and even up to a lifetime. Memories in long-term memory are more efficiently retrieved when items of information are grouped into chunks of information known as schema. A schema might include knowledge related to how to take a photograph with a mobile phone, all there is to know about a particular letter of the alphabet, or moves in a chess opening. This suggests that the valued tools of deep learning, including comparing, contrasting, problem based learning and discovery learning, are effective because they foster the development of schema.  Those ”aha” moments when some item of knowledge seems to click into place probably occur when a link is discovered between two previously unconnected schema, which can then merge into one bigger schema.

The process of storing a schema is not yet fully understood, though it is likely that it some type long lasting physical change in some part of the brain as a result. (recent research). More significant is that although long-term memory stores information, it is not consciously aware of the information it stores.  (Remembering everything we know at the same time all the time would be overwhelming.)

Working memory – the traffic jam of our minds – actually doesn’t work very well.

Working memory (often also referred to as short term memory), is the “thinking” part of our brain, used for consciously noticing our environment, or recalling earlier memories.  Although working memory processes information, it has limited storage capacity compared to long-term memory. If a computer analogy helps, then imagine that the hard drive is long-term memory, and RAM is working memory. Those moments when information “pops” into your head are working memory alerts advising that a long-term memory has just been retrieved from long term memory, sometimes long after the search to locate it was initiated. (The really crazy part about this is that the search seems to go on even while you are totally unaware that it is happening.)

The built-in design limitations of working memory are startling.  Each information item in working memory has a shelf life of twenty seconds, and any information not refreshed within those twenty seconds and not stored in long-term memory is lost forever. Furthermore, only seven information items (plus or minus two) can be stored at a time. Attempting to add additional information beyond this capacity either fails, or displaces an existing element. Research suggests that if information needs to be processed or manipulated in some way, working memory capacity is further reduced to an average of four separate items of information. Add fatigue, stress, distraction and the figures are even worse. Working memory problems are associated with many learning difficulties encountered in the classroom.

Cognitive load theory stresses the impact of the limitations of working memory, and the importance of structuring learning to reduce the load on working memory. Working memory is really important because that’s where conscious thought happens, so it’s also where understanding happens.  That classic “too many things at once to think about”  feeling is actually “working memory is overloaded and information is not being retained any more.”  What happens to information that’s squeezed out of working memory to make room for other information?  That depends. If enough attention is given to the information, it might make it into long-term memory, where it may or may not be able to be recalled again at a later time. If no attention is given to the information when it is first acquired then the memory of it takes on the zen-like quality of one hand clapping – it doesn’t quite exist and maybe it never did, and it certainly won’t be remembered.

The schema of it all

The limitations of working memory seem so restrictive that it seems almost impossible for any learning to occur. However there is some flexibility in the system –  items being processed in working memory might be single isolated facts, or  bundled into a schema of related information items that is processed as though it is one single item of information.

Schemas have two roles. The first is to group related information together, and this information can then hook onto new information, making it more likely that the new knowledge can be recalled.  One memory can then trigger another memory, which then triggers another memory etc..  The second important role of schema is to make more information available to working memory by requiring only one of the few available slots in working memory, instead of requiring one memory slot for each component of the schema. Once working memory capacity  includes all the information in a particular schema  then all the tasks associated with deep learning -  analyse, compare, contrast, synthesize etc., can be undertaken using a much larger cluster of information elements, to create even more complex schema.

These  extensive schemas (schematas if you want to sound really academic)  can then be stored in long term memory for later recall and further processing in working memory as required. This is the deep understanding type of learning that we all know and love. It’s an elegant system of task specialization between working memory (organize schema) and long term memory (store schema), but with one significant catch.

All schemas are not equal

The catch is that schema, those bundled chunks of related information, can only handled in working memory as a single unit of information when no conscious effort is required to recall the links between the information in the schema. (That’s a lot of information in one sentence – enough to overload working memory, so you might need to read it a couple of times before you really understand how all those bits of information in that schema fit together – it’s saying that schema do not save working memory space until they have been memorized extremely well.)  The lightbulb message in all this is that understanding is memory in disguise. You need to be able to effortlessly remember  information bundled in a schema in order to be able to create and develop improved schema that will be then stored in long-term memory for later recall.  Effortless as in thinking that salt goes with pepper, effortless as in automatically knowing which finger goes on which note of a piano keyboard or a guitar when you play a particular chord, without having to recall in detail which finger goes on which note. Effortless, as in knowing how to pronounce each word you are reading at the moment, without having to sound out each letter, or recognize certain combinations of letters such as “wait” and “weight” sound alike but have different meanings. Effortless, as in knowing that 4 and 6 are both factors of 12, so that it becomes readily apparent why12 will be the Lowest Common Denominator ( or Least Common Multiple) when trying to add fractions with denominators of 4 and 6.

Because schemas that are not effortlessly recalled require more space in working memory, less working memory space is available for processing information.  Information then has to be transferred more frequently between long-term memory and working memory, which is both time consuming and error-prone.  The additional required time and more frequent mistakes made by someone learning new material, before the relevant schema are automated,  is an all too common example of this scenario.  In some situations working memory may be so overloaded that it is not possible to do thinking required by the task.  (For an example of the difficulty of performing an easily understood task with an overloaded working memory, see Monster Problem for Working Memory ).

Automation makes all the difference

The cognitive science term for this effortless recall is automation. Automation2 is the ability to process the schema information automatically, without conscious effort.  According to cognitive load theorists, once a schema is automated, that schema only requires one slot in working memory. If a schema is not automated, it is broken down into its information components, and each component requires one of the limited number slots in working memory space. Jason might  understand that  5 x 3 = 15, because he can count on his fingers, count by fives, or may even  understand why 10 X 3 divided by 2 would give the same answer, but until the schema for 3 X 5 = 15 is automated, it will required three slots in working memory – one for the 3, one for the 5, and one for 15.  All three slots will be required for Jason to find the factors of 15.  Once Jason knows that 3 X 5 = 15 without having to do any conscious thinking, the schema has become automated. All the information in the schema , the  3,  5, 15 and multiply now only require one slot in working memory.

Automation is the vitally important no-thinking part of understanding.  Once a schema is automated, you don’t have to clutter up working memory with the details of it anymore. No thinking is required because all the details of the schema are automatically available to working memory when any part of the schema is recalled into working memory. All the usual deep learning techniques that activate prior knowledge and make connections between knowledge and  are ideal for creating schema,   but conceptual understanding, the holy grail of learning, requires an interplay between information already in working memory. The more information available to working memory the more conceptual understanding can occur, and the only way to make more information available to working memory is to have automated (ie memorized) schema.  Students who don’t “get it”  don’t have all of the information they need in their heads at the same time to be able to see the big picture. (Concrete learning aids are great because they reduce working memory load.)

Alas,  as yet there is no known shortcut to schema automation. All the evidence so far indicates that the prime requirement  for automation to occur is extensive practice and repetition. Very extensive practice is even better. It doesn’t matter whether this extensive repetition is achieved by explicit repeated practice and drilling, or implicitly through activities that repeatedly involve the schema that needs to be automated, (computer games are fine) so long as there is plenty of repetition.  It’s perhaps ironic that the dreaded mindless repetition does seem to be necessary for automation to occur.  It’s also counter to the common wisdom that memorizing (rote-learning) is an inferior alternative to learning by understanding.  In some instances automation  – i.e. memorization – is prerequisite to learning – or at least co-requisite.

How shall I remember thee? Let me count the ways!

The obvious question then is what is the most effective type of  practice and repetition necessary for automation of schema to occur. Practicing retrieving the information is likely to be more effective than repeating the information without trying to retrieve it, which suggests that self-testing via flash cards is likely to be better than reciting the questions and answers with the answers immediately supplied. It seems that anticipating being tested on recall of the information (gasp &*$% the “T” word), makes it more likely that the information will be recalled.  In some studies, actually being tested improves recall on subsequent tests, even without further practice between tests.  Sleeping immediately after trying to memorize something also improves recall, regardless of what time of day the sleep is taken. Repetition also helps, and spaced repetitions requiring memorization of  small amounts with short practice times is better that large amounts with longer duration recall practice sessions. Spaced repetitions are more likely to be remembered for a longer period of time.

Her schema is better than your schema – that’s why she’s an expert.

With this concept of learning, the difference between the struggling student and the high-ability student can be seen as similar to the differences between a novice and an expert in a particular field of knowledge. Not only do the capable student and the expert have more extensive schemas available, but more of those available schemas are automated. The novice’s expertise will improve when the required schemas become more automated, making more information simultaneously available to the conscious thought processing of working memory.

Two is better than one

So it seems that cognitive research indicates that some memorizing  is required for deep understanding to occur.  Understanding helps create the schema, but memorizing the schema makes it more likely that the knowledge in the schema can be then used for deeper understanding.  Students who avoid memorizing the basic schema for any given topic pay the penalty of  increased difficulty in acquiring a deeper understanding of that topic. Deep learning is the result of  both understanding AND memorizing, not understanding instead of memorizing. Like the proverbial horse and carriage, you can’t have one without the other, but unlike the horse and horse and carriage which one comes first is not so critical. Rigid adherence to one type of learning, be it understanding or memorizing, at the expense of the other is likely to be less effective than both understanding and memorized data  in tandem.



1. Artino, A.R., Jr. (2008). Cognitive load theory and the role of learner experience: An abbreviated review for educational practitioners. AACE
Journal, 16(4), 425-439.

2.Cognitive load theory, learning difficulty, and instructional design (Sweller)

Research into Cognitive Load Theory and Design instruction at UNSW is an easily readable summary of cognitive load theory, and includes how to apply cognitive load theory to improve learning outcomes.

Leave a Reply

Your email address will not be published. Required fields are marked *