Applying Cognitive Load Theory to English Part 5: Combining the Alternation Strategy and The Problem Completion Effect-English Language GCSE

This is the fifth post looking at the application of Cognitive Load Theory to teaching English. The first three posts can be found here: one, two, three, four.

This post will explore how I have applied ‘backwards fading’ to teaching GCSE Language reading questions.

Taken from ‘Efficiency in Learning’ by Clark et al, this diagram explains the idea of ‘backwards fading’.

backwards fade

GCSE English Language has unseen extracts, meaning that it is easy to claim that there is no fixed or defined domain of content knowledge to be taught. Extracts can vary widely and students who are successful are often those who read widely outside of class, have developed vocabularies and possess good general knowledge. So how can we prepare students for this exam? There is no quick fix here. Students need a knowledge-rich five year curriculum that teaches vocabulary explicitly and unlocks challenging and canonical works of literature. Lessons should involve lots and lots and lots of reading to develop fluency, accuracy and promote the building of background knowledge. Students also need to be taught accurate and sophisticated writing beginning at the sentence level.

Despite it being utterly tedious, the one thing that we can teach about this exam is the procedural knowledge for approaching the questions. Like the directions to an important job interview, being able to recall and apply this knowledge is of vital importance….for a few hours….on one day. After that, it is largely useless.

When we begin teaching the language exam in year 11, we give them a procedural knowledge sheet that explains exactly how to answer each question, complete with timings, and often including diagrams to help the lowest attainers understand what they need in their answer. We ask them to memorise this information and regularly test them on it in the hope that if they know the recipe for a good answer, they are more likely to produce what is required. I want them to instantly know what approach to take in the exam: automatic and precise recall of this procedural knowledge is crucial for success and ensuring they answer the question correctly.

We use exactly the same backwards fading and instructional approach for all of the GCSE Language questions, the aim being to minimize unnecessary extraneous cognitive load so that students are able to focus completely on practicing and learning the required approach for each question. We use the ‘Alex Cold’ extract from the AQA sample assessment materials and the exemplar scripts that come with it. When introducing a new question type, we look at these models. When we introduce question 3, students have already read the extract several times and have practiced question 2. By this point, they are already familiar with the extract, meaning that they can direct their full attention on the specific requirements of the question. Although students can groan when presented with the same reading extract, complaining that it is boring or that they have done it already, if they were to be presented with a new extract each time they attempted a new question, there is a greater chance that they will reach the limits of their working memory capacity, experience cognitive overload and become confused as they try to juggle far too much new information at once.

Here is how I have combined the alternation strategy with backwards fading to teach AQA GCSE English Language Paper 1, Question 3 to a low attaining class:

Lesson 1: Present and Label Examples and Non-examples

The teacher reads the question and explains the important parts that are underlined here in red:

question 3 annotated question

The teacher reads the entire model answer. They can then draw a rough answer structure underneath the model:

q3 answer chain

These answer diagrams can be really useful. Not only do they tell student what they should include, but they can also be used for retrieval practice, allowing students to test themselves on the procedural knowledge for each question.

As this is the first time students have encountered this answer, the teacher will then annotate the entire model answer, using the diagram above as a key and narrating their thought process throughout. The purpose of this lesson is for students to develop a conception of what a typical question 3 answer looks like. Explicitly telling them what is required is probably far more efficient than letting them discover what is required for themselves, the latter approach potentially resulting in confusion and the embedding of misconceptions. Later on in the instructional sequence, when students have a firmer conception of what is required and a more developed ‘question 3 schema’, they will be asked to do more of the steps for themselves, eventually resulting in independent problem completion.

Here is an annotated Alex Cold model. The answer is a typed up version from the sample assessment materials:

q3 model

After they have annotated the exemplar of quality, they would then look at a substandard model answer, a ‘non-example’ that contains common errors and features that you would associate with poor responses. Like the previous model, we have taken this from the sample assessment materials. It contains rubbish like ‘In my opinion, this is a really good way to get the reader’s attention because it makes me stop and think about what is going on’ and ‘The writer uses language techniques such as verbs, adjectives, similes’ and ‘It made me understand the text more’.

Students then open their books and draw the same answer diagram from before at the top of a new page, this time from memory. The teacher then writes down some ineffectual phrases (taken from the poor quality model) as well as any others that are relevant (They have done this to make it sound interesting/This makes me want to read on etc), ensuring that students know what not to include. Students are provided with some structure sentence stems to use:

structure phrases.png

Students then write an answer of their own while the teacher writes another example answer under the camera, providing a further model for the weaker students to read and use for inspiration. This is the application of the alternation strategy: students have studied a worked example and can use it as an analogy or a support when attempting their own response.

One of the criticisms directed at model answers is that they are often plagiarized by students, preventing them from thinking for themselves and coming up with their own ideas. Because students have seen a model answer in this first lesson, it is inevitable that their answer will look very similar to the example that they studied. In this case, I don’t think that this is a problem: the aim of this initial lesson is for students to gain a solid understanding of the structure of the answer and what is required in their response. I want them to have a high initial success rate and the models and sentence stems allow this to happen.

LESSON 2, 3 and 4: Completion Problems:

Like many lessons, these would begin with a retrieval quiz, asking questions about Literature texts, vocabulary, quotations and anything else that we have covered. I would include a number of questions about the preceding language paper lessons:


  • How long should you spend on Q2?
  • Draw an answer diagram for a Q2 answer
  • How long should you spend on Q3?
  • Draw an answer diagram for a Q3 answer
  • Write down four ‘structure phrases’ that you could use
  • Write down four banned phrases.

All of this information is crucial for student success. I want them to have automatic and flawless recall of what is required and constant (the procedural knowledge for the exam) so they can focus on what we cannot predict or teach directly: the extract itself.

Ideally, before this lesson, the teacher would have checked the books and listed some misconceptions from the student responses. These can be dealt with at the start of the lesson through explicit teaching. Here are some examples from recent lessons that I taught:

  • A student has mistakenly written about the effect of language in their question 3 response.

The teacher presents the problematic response:

At the beginning of the extract, Alex dreamt about ‘an enormous black bird’ which has connotations of death and suffering.

The teacher then rewrites it under the visualizer, narrating their thought process and making it clear why this is an improvement:

At the beginning of the extract, Alex dreamt about ‘an enormous black bird’, allowing the reader an insight into Alex’s emotional trauma. When he snaps at his sister at the breakfast table, we can understand the reasons for his irritable and aggressive behaviour.

The teacher then presents a further problematic response:

When Alex gets up, the writer focusses on his thoughts as he thinks that it will be a ‘terrible’ day, making it sound pessimistic, negative and despondent

The student can then rewrite this, using the first worked example as an analogy. If needed, several more of these worked example-problem pairs can be presented and attempted.

Engelmann believes that teachers should look at ‘mistakes for qualitative information about what you need to change in your instruction to teach it right.’ Following Engelmann’s approach, these errors are a fault of the instruction not the student and next time I teach this, I will include this task in the initial teaching sequence, hopefully ensuring that less students make these mistakes. This is why the curriculum is never entirely complete: we should be constantly taking evidence from student responses and using it to inform changes and adaptations in our teaching sequences.

For each of the ‘Completion Problem’ lessons, we look at a different extract. By this point, students should already have a firm understanding of the structure of an answer and as a result, can apply this knowledge to a new extract. We read the extract and then the teacher explicitly tells them which parts of the extract to underline and why, again, narrating their thought process throughout. Instead of studying an entire worked example like the previous lesson, students are presented with half of a model answer.

half model q3

In this half model, a teacher might start to annotate the specific parts of the answer (the Number 1 bracket) and then ask the student to do the rest. The student is able to use the teachers annotated part as a worked example, helping them with their own annotations.

This ‘half-model’ is a completion problem and students are asked to complete the answer themselves. They can use the annotated ‘half-model’ as a guide that demonstrates the structure of their answer as well as exemplifying the level of quality and depth that is required.

Lesson 5: Problem

After beginning the lesson with whole class feedback, retrieval practice or both, I would then ask students to attempt a question 3 independently, initially splitting the task into logical, sequential steps. Breaking the task into smaller steps allows the teacher to give more precise feedback and to pinpoint misconceptions. If the task was not split up like this, it would be harder to diagnose why a student is struggling.

  1. Read the extract and question 3: underline relevant parts in the text that you plan to write about.
  • Getting feedback here prevents students from making poor choices regarding their textual references.
  • You can ensure that students have broadly included something from the beginning, middle and end, or at least a reasonable section of the source.

2. Ask students to draw answer diagram from memory at the top of the page

  • If students cannot do this, it is clear that they have not retained the procedural knowledge needed for the question
  • After they have finished writing, you can ask them to self-assess their answer, checking that it contains all aspects in the answer diagram.

3. Students then complete their answer in silence, allowing the teacher to circulate and give live feedback.

Later Lessons: Building up stamina and combining with other questions

After massed practice on question 3, we combine it with other questions, slowly building up towards an entire paper.


  • Teach and Practice Q2
  • Teach and Practice Q3
  • Cumulative Practice: Q2 and Q3
  • Teach and Practice Q4
  • Cumulative Practice: Q2, 3 and 4
  • Teach and Practice Q5
  • Cumulative Practice Q2, 3, 4, and 5

General Points:

The sequence of lessons above uses ‘backwards fading’ to slowly move students from studying worked examples to completion problems to independent practice. The speed in which a class progresses through this continuum is determined by the level of success experienced at each stage. As a rule of thumb, I would not change stages-thereby reducing the scaffolding and support-until students have achieved a very high success rate. Often, we rush through the worked example and completion stages, enthralled by the promise of independence and keen to allow our students autonomy and agency. We should not confuse methods with goals: independence is a goal and is best achieved through the methods of explicit instruction, scaffolding and support, not through repeated independent practice.

Every year, when I teach students how to approach the language questions, I know that even if they are able to master what is required for each question and successfully apply the procedural knowledge, the unseen nature of the exam means that they still might not do very well. If a student doesn’t read and has a limited knowledge of the world outside their immediate existence, then the probability of them fully understanding an extract is low. Although I earlier claimed that GCSE English Language has no fixed domain of knowledge, perhaps it is more accurate to claim that it does: the domain is reality itself (or at least the parts of reality that a literate teenager with a reading age of fifteen would be expected to understand).

Next Post: Applying Cognitive Load Theory to English Part 6








Applying Cognitive Load Theory to English Part 4: Combining The Alternation Strategy and The Problem Completion Effect-Examples from teaching writing.

This is the fourth post looking at the application of Cognitive Load Theory to teaching English. The first three posts can be found here: one, two, three.

Teaching students how to write is an essential part of not just teaching English, but also teaching other subjects too. Over the last few weeks, an excellent series of blogs has been written by science teachers exploring the importance of written expression in their subject and these posts are worth a read whatever your subject specialism. Like them, my teaching has also been heavily influenced by The Writing Revolution, a book that explains the importance of explicitly teaching students to practice and apply specific sentence structures to their writing, thereby developing not only their range of expression, but also the precision and sophistication of their written output. When teaching writing explicitly, both at sentence and paragraph level, the use of worked examples and backwards fading can help students achieve a high success rate as well as ensuring that learning is as efficient as possible.

Here is a conceptual model of this process from Efficiency in Learning by Clark et al:

backwards fade

This post will explore the application of Cognitive Load Theory and specifically backwards fading to the teaching of noun appositives.

One of the sentence constructions explained in The Writing Revolution, noun appositives allow students to add extra detail into their writing. This blog post explains their form and usage in more detail.

Although I would have begun the teaching of appositives with multiple lessons that contain sequences of examples and non-examples, allowing students to see the range and limitation of the construction as well as anticipating any common misconceptions, what follows demonstrates the application of the worked example effect and backwards fading to an instructional sequence.

Lesson 1 Step one: Present and label examples

  • A poem that denounces exploitation, London conveys the omnipresence of suffering.
  • Ozymandias, a poem that highlights the transience of human power, demonstrates that nobody is immortal.
  • A callous capitalist who disdains collective responsibility, Mr.Birling is only concerned with ‘lower costs and higher prices’

Depending on the class, this may be entirely teacher led or may involve a series of questions asking students to help with labeling: What is the subject of the sentence? Where is the verb? What is London? It is a poem that denounces exploitation. What is Ozymandias? It is a poem that demonstrates the transience of human power.

As mentioned in this post, using arrows, prompts and labeling can help make the implicit interactions and relationships between different components obvious to students.

Lesson 1 Step two: Begin further examples and ask students to orally complete them

Teacher writes this under the visualizer:

  • A manipulative woman who berates her husband,

The teacher can then ask a number of retrieval practice questions like Who does this describe? What does manipulative mean? What does berate mean? Why does she do this? With a weaker class, the teacher may want to give several completed oral examples before asking students to attempt their own. Students can then complete the sentence orally, a task that allows students to experience success before even attempting to write their own answers. Stronger students can offer ideas first, allowing weaker students to hear further examples before attempting it themselves. Asking students to narrate the punctuation in their spoken sentence (A manipulative woman who berates her husband COMMA Lady Macbeth…) helps to draw attention to the necessity of punctuation in the construction.

To further focus the practice or raise the level of challenge, the teacher can ask students to include specific things in their completed oral sentences.


  • A manipulative woman who berates her husband,


  • ‘pour my spirits in thine ear’
  • sinister

Lesson 1 Step three: Begin further examples and ask students to complete them in writing

Students are presented with a series of half completed examples and are asked to complete them. With weaker classes, it may be useful to complete an example or two under the visualizer, narrating your thought process so that students know exactly what they are supposed to do.

  • A tyrannical and vainglorious King, Ozymandias
  • Hyde, a sadistic character who…………………………………………………., seems
  • A criticism of……………………………….., An Inspector calls encourages the audience to
  • The archetypal Victorian gentleman, Utterson

Because of the restricted nature of these completion problems, precise and immediate feedback can be given by the teacher. Students can be asked to read out their sentence, again narrating the punctuation so that the teacher-or students for that matter-can ascertain whether it is correct or not. This is a much faster feedback loop than taking all the books in and marking them. The teacher can then ask further questions to the student who gave the sentence (or different students) about the completed construction in order to draw attention to the function of the appositive or to provide further retrieval practice about the content of the sentence.


Student: The archetypal Victorian gentleman COMMA Utterson is ‘austere’ and secretive COMMA avoiding fun at all costs.

Possible Teacher questions: Read out just the appositive. Who does it rename? What does archetypal mean? Which word in the sentence is a quotation? What does ‘austere’ mean? What specific things does Utterson do that make him ‘austere’? Is he always like this? Why is his secrecy a form of duality?

If the sentence contains a weak point, the teacher-or other students-can make suggestions for improvement:

What is a better word for ‘fun’? Frivolity. Crucially, all students are able to answer this as the question is asking them to apply vocabulary that has been explicitly taught in previous lessons.

Like with the oral completion exercises, the level of challenge can be raised by asking students to include specific things in their answers, helping them to make links between different bits of knowledge.


  • The archetypal Victorian gentleman, Utterson


  • ‘repressed’
  • ‘never lighted with a smile’
  • contrived socializing with Enfield

In Expressive Writing 2, a Direct Instruction writing scheme, most lessons end with a single or multi-paragraph piece of writing and students are asked to complete a series of precise checks when they have completed their writing in order to ensure that they have applied a necessary skill and avoided making common, careless errors. This approach is also useful to the everyday classroom: instead of asking students to check their work-a vague statement that may be interpreted by students as ‘skim read a bit of it’, it can be useful to specify exactly what they are checking for. These common errors can either be based upon what is being taught (the most likely mistakes that students will make when attempting the task) or they can be based upon the class that you teach, having been chosen as a result of feedback to the teacher: if a weaker class are, on average, bad at using full stops, then include that as a check. Like in Expressive Writing, I would ask students to complete one check at a time in order to prevent cognitive overload.

Possible example checks for appositives:

CHECK 1: Does your appositive rename a noun that is right beside it?

CHECK 2: Is your appositive separated from the rest of the sentence with a comma or pair of commas?

CHECK3: Does your sentence end with a full stop? (This check would be unnecessary for more proficient students!)

You may have noticed that I have still not asked students to complete problems on their own and this is deliberate. I want students to experience quick, initial success with writing these structures and spending time and effort on worked examples and completion problems allows this to happen. By doing so, students-even weaker ones-can experience high initial levels of success, something that both Engelmann and Rosenshine have identified is of crucial importance in instructional sequences. If students are successful, this raises their motivation levels: even the most apathetic students can become enthused if it is clear that they can succeed.

One lesson is not enough however, and instruction should continue in a track system across multiple lessons so that students develop automaticity. The instructional sequence should aim to broadly follow the idea of backwards fading, slowly removing prompts and scaffolding so that students eventually apply their knowledge independently and without support. Finally, tasks should slowly change from isolated and decontextualized sentence practice to wider application within paragraphs and extended writing.

Here is a rough overview of what might happen in subsequent lessons:

Lesson 2, 3, 4, 5 and 6: Further worked examples and completion problems

Although the first lesson that I described would probably be a dedicated ‘Appositives’ lesson, the subsequent lessons would more likely be practice tasks within lessons that contain many other foci.

While earlier lessons saw the teacher leading the labeling and annotation of the worked examples, as students become more proficient, they can be asked to do this themselves. The completion problems would be less restrictive, giving the students more autonomy and requiring them to complete more of the steps themselves. While the earlier completion problems provided the noun that does the renaming (A vainglorious and tyrannical king, Ozymandias…) the examples below require students to generate it for themselves:


  • Hyde, a…………………………………………………………, is the opposite to
  • A…………………………………………………………, Lady Macbeth
  • A……………………………………………………………, The Prelude

Lesson 7, 8 and 9: Interleaved Completion Problems

These lessons may ask students to complete a range of different sentence styles, interleaving appositive practice with other constructions. In the example below, semi-colon practice is mixed up with appositive practice. Like with the appositives, students would have already completed the ‘I and We’ stages with semi-colons before encountering them in this interleaved exercise.


  • A denunciation of…………………………………………….., London……
  • The Inspector admonishes the Birlings for their callousness; he wants
  • Macbeth’s sword ‘smoked with bloody execution’, an image that demonstrates….
  • Lady Macbeth, a manipulative……………………………………………, berates…….
  • Duncan is oblivious to their plans; he…..

Lesson 10, 11 and 12: Independent Problems.

At some point, the worked examples become unnecessary and redundant as students will have developed a mental conception of what it is that is being taught. Instead of studying a model, they can retrieve the relevant schema from their long term memory when attempting the task. Asking students to write a few specific sentences is a useful exercise.

Later lessons: Wider application

Once students have demonstrated the ability to accurately produce the construction in isolated drills, they could be asked to write a paragraph that contains noun appositives as one of the success criteria. Currently, I am teaching a year 9 class to write analytical essay introductions using noun appositives. They broadly followed a similar instructional sequence to the one above and, as a result, they have achieved a high success rate with this wider task.  Later still, they could be asked to apply the constructions to more extended essay type answers.

General Principles

How long should you spend on each stage of the backwards fading continuum?

If we are to maximize efficiency in our instructional sequences-a key and important goal given that lesson time is finite-then we need to carefully consider two competing demands. Firstly, we should ensure that students have a high success rate: at least 70-80% for new material and even higher for material that is being practiced and firmed. Using worked examples and completion problems can make this level of success a reality, helping to minimize unwanted cognitive load. Secondly, we should use backwards fading so that students are asked to complete applications independently as quickly as possible. If we keep presenting worked examples, then not only will this waste time, but it may also prevent students from developing the ability to complete tasks without support. The example lessons above are an attempt to show how support should be faded, following the I-we-you continuum and ending with independent student application. How long students spend on each stage of the continuum is an empirical question and will largely be determined by the quality of examples (and non-examples) and completion problems that you use as well as the proficiency and prior knowledge of the students. Feedback to the teacher is key here: if students are performing successfully on a stage, then you can make the transition to a lower level of support, increasing the number of steps that a student is expected to complete.

How do you optimize practice drills?

Practice sentences should probably involve content from whatever it is you are studying as this will stretch and develop student thinking about the subject matter. Not only will students be developing their writing skills, but they will also be deepening their understanding of the content.

In Teach Like a Champion, Lemov explains ‘At Bats’, the idea that ‘succeeding once or twice at a skill won’t bring mastery, give your students lots and lots of practice mastering knowledge or skills’. This is crucial. When students are asked to complete problems independently, they should practice extensively, ideally across a number of lessons and including ‘multiple formats and with a significant number of plausible variations.’ Here are some of the possible variations when teaching appositives:

  1. Varying where the appositive is in a sentence (start, embedded or the end)
  2. Varying the writing genre for the task (analytical, descriptive, rhetorical)
  3. Varying the content (Macbeth, An Inspector Calls)
  4. Adding quotations to the appositive
  5. Varying the number of appositives in a sentence
  6. Varying the length and level of detail within the appositive: adding ‘who/that’ is a great way of adding further description
  7. When they have mastered all of the above, combining the appositive with other sentence skills that you are teaching

Engelmann’s track systems within his programs are fixed and have been created based upon rigorous field testing so as to ensure on optimum spacing, fading and efficiency. This requires a phenomenal amount of work and is almost certainly beyond the reach of busy teachers. At present we are not at a stage where these sequences have been formalized into our booklets, but this is slowly changing as we have developed progression models for both grammar and analytical skills, ensuring that we are consistent and that students master the skills of writing in a logical and methodical sequence.

Next Post: Further examples of backwards fading in English

Applying Cognitive Load Theory to English part 3: The Problem Completion Effect: An Overview

cogload theory

This is the third post looking at the application of Cognitive Load Theory to teaching English. The first two can be found here and here.

In Cognitive Load Theory, Sweller et al posit that ‘One early concern about the use of worked examples was that they led to passive rather than more active learning. Would learners attend to and study the worked examples in enough depth or would they simply gloss over them’. One of the solutions to this concern was the development of The Alternation Strategy, an approach that I wrote about in the previous post. Another solution to the concern that students would not pay sufficient attention to the worked example, and therefore not build the required schemas within long term memory that they could use when attempting subsequent problems, was the development of Completion Problems, tasks that include ‘a partial worked example where the learner has to complete some key solution steps’.

So why do they work?

efficiency in learning cover

In Efficiency in Learning, Sweller et al posit that ‘completion examples reduce cognitive load because schemas can be acquired by studying the worked-out portion. Requiring the learner to finish the worked example ensures that she will process the example deeply.’ If we accept the idea that when teaching novices, our time in class should be spent largely on building schemas and developing background knowledge, then this is important. Completion problems could help ensure that students pay sufficient attention to the worked examples. As Sweller et al point out in Efficiency in Learning, the completion example ‘reduces cognitive load by incorporating some worked-out elements and it fosters deep processing by requiring completion of the remaining elements’. The authors summarise their efficacy by positing ‘a completion example offers psychological balance. It reduces cognitive load by incorporating some worked-out elements and it fosters deep processing by requiring completion of the remaining elements’. An intermediate approach, completion problems are an attempt to alleviate the concerns and possible shortcomings with fully worked examples and problem solving. While problem solving may create excessive cognitive load, completion problems help to mitigate this. While fully worked examples may be ignored by students, completion problems require more mental effort, hopefully resulting in students thinking harder about them.

What does the research say?

worked example and completion problem table




The table above displays the results from a 1992 study where students were taught statistical concepts like mean, median and mode. Students were placed in one of three instructional set ups: all problems, worked example and practice pairs (The Alternation strategy) and completion problems and practice pairs. Further studies (Paas & van Merrien boer 1994; Trafton & Reiser 1993) also found that worked examples and completion examples were ‘more efficient and equally effective in terms of learning outcomes than lessons that required learners to work all problems’.

So, if both worked examples and completion problems are ‘equally effective’ then how are we to know which one to use? This is an important question and there is a possible solution in an approach known as ‘backwards fading’.

Backwards Fading

One of the six principles of task design in Engelmann’s Direct Instruction is the shift from an emphasis on the teacher’s role as a source of information to an emphasis on the learner’s role as a source of information. This shift broadly matches the notion of the ‘I-we-you’ model common in explicit and direct instruction. The three stages also seem to broadly link to some of Rosenshine’s principles of instruction. This broad convergence should make us stop and think: if three frameworks all broadly identify a similar approach as being effective, then perhaps this is a continuum that should be systematically threaded into our instructional sequences and schemes of learning.

3 frameworks.png

Cognitive Load Theory also describes this gradual fading of support and presents an almost directly analogous continuum that students move along as they gain in proficiency. In Efficiency in Learning, the authors suggest that ‘fading techniques allow you to accommodate a gradual learning process’, a statement that seems to echo the DI approach of small, manageable and incremental steps along a learning pathway. As learners begin, they ‘should devote as much working memory as possible to building schema’, and the most efficient way of doing this is probably by studying worked examples that precisely exemplify not only the concept that is being taught, but also what success looks like. While abstract and ambiguous success criteria may be useful for teachers, concrete examples are probably far more useful for students.

A lesson, or series of lessons, that involve ‘backwards fading’ will begin with worked examples, providing concrete models of what is being taught. The next worked example could be a ‘completion problem’ where students are expected to finish a particular task that has been started for them; eventually, learners will be asked to attempt entire problems for themselves.

Here is a conceptual model of this process from Efficiency in Learning:

backwards fade.png

While this conceptual mode describes a lesson, if we follow Engelmann’s guidance, new concepts should be initially taught across at least two lessons and then continued in a track system across many more, resulting in distributed practice and becoming interleaved and integrated with other things that have been taught, eventually resulting in flexible, wide application.

What does ‘backwards fading’ look like in English?

Here is an example instructional sequence from teaching ‘Even though’ sentences to a very low ability year 7 class. The sequence was during one lesson and it combines ‘backwards fading’ with Engelmann’s ideas about examples and non-examples:

Written Worked Examples:

  • Even though it was raining, I went outside without an umbrella. (surprising/opposite)
  • Even though it was raining, I went outside with an umbrella     (not surprising/not opposite)

Following Engelmann’s guidance, I presented minimally different examples that share the greatest number of irrelevant features to make clear the meaning of ‘Even though’, ensuring that, logically, only one interpretation is possible. I labelled them as ‘surprising’ or ‘not surprising’ in an attempt to describe the function of ‘Even though’. To avoid ‘stipulation’, the idea that students will erroneously infer that ‘Even though’ sentences are always about weather or involve umbrellas, I followed these written examples with a series of wider ranging oral examples, asking students ‘OK’ or ‘Not OK’. I did this using choral responses where students responded to a signal-in this case, me dropping my arm-in order to maximize student response rate and allow me to get feedback from the whole class as to their understanding. If only one student answered, this would not tell me if the others had understood. The signal is to ensure that everyone responds at the same time so as to minimize the opportunity that students will merely copy their peers. Although the sequence below certainly breaks some of the rigorous theoretical principles that examples and non-examples are supposed to adhere to, it resulted in a very high success rate.

Oral Example sequence:

Even though I was the fastest, I lost the race.    OK

Even though I was the fastest, I won the race. Not OK

Even though she loved pizza, she ate one.       Not OK

Even though she hated pizza, she ate one.          OK

Even though I loved football, I played tennis.     OK

Even though I wanted to see my friends, I met them.   Not OK

Even though he had no money, he went shopping.      OK

Even though I was the best at physics, I passed the test Not OK

Completion Example 1:

  • Even though I was starving, I
  • Even though I hated eating anchovies, I
  • Even though she was brilliant at Mathematics, she

After the choral responses, students attempted these styles of sentence where I had begun the construction and asked them to complete them.

Completion Example 2

  • Even though
  • Even though
  • Even though

Here, the worked part of the construction had been stripped back even further, allowing students to attempt even more of the problem themselves. Following the conceptual model from Efficiency in Learning, the final task saw students creating their own sentences with no completed steps or support.

I am currently teaching the same class how to use embedded evidence in sentences, a crucial foundational skill for analytical writing which is entirely new to my students. Because of the complexity of what I am asking them to do, the instructional sequence has already spanned over four weeks, broadly following the ‘backwards fading’ continuum. Typically, practice exercises will be completed every lesson, although they are limited in duration and combined with lots of other activities. I will write about this process in more detail in a later blog.

Backwards fading seems like a sensible approach when teaching almost anything as, if it is done correctly by spending sufficient time on each stage, it seems to hit a sweet spot where cognitive load is managed and student effort is maintained. The ‘backwards fading’ continuum is a sequence of tasks that slowly increases learner application and effort whilst simultaneously reducing the need for worked steps. As students grow in proficiency, they are required to solve more of the problem themselves, a shift that represents the transition from novice to expert.

Next post: Combining The Alternation Strategy and Problem Completion Effect-Examples from teaching writing.

Applying Cognitive Load Theory to English Part 2: The Alternation Strategy- How example problem pairs can work in English.

This is the second post looking at the application of Cognitive Load Theory to English. The first one can be found here.

‘The Alternation Strategy’, also referred to as ‘worked example problem pairs’, is the idea that ‘for an example to be most effective, it had to be accompanied by a problem to solve.’ The most effective use of worked examples is to ‘present a worked example and then immediately follow this example by asking the learner to solve a similar problem.’ Interestingly, the researchers found that if you give students a number of massed worked examples and follow that later with a similar massed set of problems, then this led to poor outcomes. For The Alternation strategy to work effectively, both example and problem need to be presented simultaneously.

Greg Ashman has written here about how this applies to Maths teaching.

efficiency in learning coverOne of the specific benefits of using worked example problem pairs is that they accelerate learning, reducing the time required for instruction. In Efficiency in Learning Evidence Based Guidelines to Manage Cognitive Load by Clark, Nguyen and Sweller, the authors explore a 1985 study by Sweller and Cooper which used algebra problems. Students were assigned to two groups: one in which students completed eight problems and the other in which they studied 4 sets of example problem pairs. Here are the findings:

worked example results table

The ‘all practice’ group took nearly six times as long to complete the instructional sequence. Students in the example problem pair group were not only faster at completing the lessons, but they were also faster at completing the test which followed. Additionally, the number of test errors was less for those students who had studied under the Alternation Strategy (example problem pairs).

Another later study quoted in the book was ‘Conducted in Chinese middle schools in which a traditional three-year course consisting of two years of algebra and one year of geometry were successfully completed in two years by replacing some practice with worked examples!’ This confirms the findings of the previous study and suggests that the technique can be adapted to real life classrooms and learning.cogload theory

Although both of these studies involve maths, in Cognitive Load Theory by Sweller, Kalyuga and Ayers, the authors make the important point that ‘the cognitive architecture…does not distinguish between well-structured and ill-structured problems’ meaning that the findings of Cognitive Load Theory apply to all domains.

So why does this work? In Efficiency in Learning Evidence Based Guidelines to Manage Cognitive Load by Clark, Nguyen and Sweller, the authors explain that ‘Having a worked example to study just prior to solving a similar problem provides the learner with an analogy available while solving the problem. When having to actively solve a problem without the benefit of an analogous example, most working memory capacity is used up for figuring out the best solution approach, with little remaining for building a schema’. Not only does the worked example present an indication of the ‘best solution approach’, providing students with a clear idea as to what is expected when they attempt the problem, but it also exemplifies the level of quality that they are to aim for.

Applying the Alternation Strategy in English

This is how we apply this approach in English. Here is a screenshot of a typical double page spread from one of our ‘booklets’ (essentially in-house textbooks that we produce and centrally plan for each unit of study):

booklet double

Before working through the stages, students would have typically completed a cumulative recap quiz or a sentence practice activity. They may also have begun the lesson with some whole class feedback and tasks that address common misconceptions from a previous piece of work.

An overview and key:

In the screen shot above, I have numbered each section in order to make this explanation clearer; lessons would normally work through the sections in numerical order.

Stage 1: A vocabulary table

This provides students with words to use in their analysis. We list all forms of the word, labelling each one with word class, an approach that, as a result of systematic and regular usage, has helped students to categorise words and to make generalisations about how different affixes can change one class to another. Crucially, the table contains an example sentence for each word, deliberately using the sentence structures that we want students to use when they are writing and each one acting as a mini worked example. The teacher will ask questions about the vocabulary, provide further examples, antonyms and synonyms and elaborate further. Students will make annotations to their vocabulary table, following how the teacher annotates under the visualiser. The teacher will make links to previous learning, asking about preceding lessons on the same text as well as other units, taking advantage of distributed retrieval practice.

They may ask student to complete oral because/but/so sentences:


After reading and discussing the relevant row from the table

Teacher: Finish the sentence, starting from the beginning: ‘Gerald objectifies Eva Smith because… .

Student: Gerald objectifies Eva Smith because he complements her appearance.

Teacher: Add evidence.

Different Student: Gerald objectifies Eva Smith because he complements her appearance, calling her ‘Young and fresh and charming.’

Stage 2: The First ‘problem’

In this case the question is What kind of man is Gerald?

Stage 3: The first extract to annotate

Students have already read many of these quotations in the vocabulary table. Many of the example sentences from the table contain embedded evidence that uses these lines. Through a process of quick questioning, the students will get lots of massed practice in seeing how the vocabulary words apply to the quotations. The teacher will annotate their copy under a visualiser: I tell students that their annotations must look at a minimum exactly the same as mine, although they are free to add additional ideas, opinions and interpretations that come up in focussed discussion and questioning. This stage of the process allows the teacher to directly and ‘live’ model what annotation looks like, providing a worked example of this crucial stage of textual analysis. The skill of annotation, something that Joe Kirby has written about here, allows students to engage in close analysis and deconstruction of language, helping them to understand the important link between text and interpretation. Annotations also serve as a useful source of revision, capturing ideas and thoughts and making them permanent.

Underlined quotations are the ones that students will memorise (essential for GCSE literature), most likely in a test that will be given a month or so after they have encountered them here, exploiting the benefit of distributed practice.

Stage 4: Worked Example

The worked example uses the example sentences from the vocabulary table. The idea here is to demonstrate how the constituent sub-components (sentences and vocabulary) fit together to form a more complex whole. It also uses the lines and ideas from stage 3, demonstrating how annotations and notes can be transformed into analytical prose. The worked examples also contain ‘The 6 Skills’ (our analytical framework) and exemplify how they are applied to a specific problem. The 6 Skills are first taught in year 7, the idea being that they are a generalizable strategy that can be applied to a wide range of analytical problems and situations. In year 7, they are taught and practiced at a sentence level, building cumulatively to their application within more extended writing. Over a five year curriculum, students will see them used in conjunction with the full spectrum of analytical writing, including character, setting and thematic responses.

How do we use the worked example?

In Efficiency in Learning Sweller et al posit that ‘To be effective, a worked example must be studied’. Following Engelmann’s ideas that tasks should ‘shift’ from prompted to unprompted  as well as from teacher led to student led, none of the worked examples are labelled in the booklets, allowing teachers to judge the approach based upon the proficiency of the class. All classes use the same models and the level of in class support, as well as the focus of the final writing task, is chosen by the teacher.

Here are some possible approaches:

A) Low level class who are inexperienced with writing analytically.

Under a visualiser, a teacher may underline, label, explain and question most if not all relevant aspects of the model, making it clear which parts are important and asking students to do the same on their copy. They may choose to merely focus on the easier skills like ‘3 part explanations’ or ‘zoom in on a word’, or even simpler, more fundamental skills like embedded evidence.

B) Class who have some proficiency with analytical writing.

In Efficiency in Learning Sweller et al describe something called A Completion Example, essentially a hybrid strategy where ‘some of the steps are demonstrated as in a worked example and the other steps are completed by the learner as in a practice problem’. With students who have already acquired some of the analytical skills or specific sentence constructions, a teacher may label one or two examples of a specific analytical skill within the model paragraph, asking students to copy their annotations. The teacher may then ask student to find other examples, using the teacher directed ones as models to guide their annotations. The next post will explore the utility of completion problems in more detail.

C) More proficient students

If a class is proficient, the teacher may ask students to annotate with minimal teacher instruction, essentially using the model as a means of retrieval practice of analytical skills as well as allowing student to broaden their understanding of their scope and breadth. If a student has seen multiple, slightly different examples of how a specific skill has been applied across different contexts, then this exposure will hopefully lead to a firmer understanding of it.

These particular approaches will be used, irrespective of proficiency level:

  • Teachers will ask multiple questions about the target vocabulary, asking students to cover the vocabulary table beforehand to ensure that they are engaging in actual retrieval practice. They may make annotations next to words e.g. exploit=use/take advantage
  • Teachers will ask about the interpretations and analysis, often asking students to use vocabulary from the table. Why might Gerald’s attempt to find food mean he is benevolent? Is he exploiting Eva here or being compassionate? What does ‘distressed’ tell us about Gerald?

Stage 5: The second extract to annotate:

Students have already been taught vocabulary in stage one, applied it in stage 3 and 4 and now they will need to apply it again in stage 5. Again, the proficiency of the class will determine how teacher led this annotation segment will be, but most importantly, students will apply the vocabulary from the table in stage 1 when annotating the lines. In stage 4, students were led through a worked example of annotations and this can be used as ‘an analogy’ with which to support their annotations in this second text extract. Thinking analytically about a text is, by its very definition, something covert and implicit, a process that exists in the mind of a writer but cannot be observed. Asking students to ‘overtise’  this process by annotating a text can provide valuable feedback to the teacher. Have they understood and applied the vocabulary from the table to the appropriate evidence or extract? Have they correctly identified a technique? Have they made a link between similar pieces of evidence to build up an argument? The teacher can circulate during this stage and address any misconceptions promptly, preventing them from becoming embedded and ensuring that the errors do not manifest themselves when the student completes the final written task. During the acquisition stage of learning, when students are learning new content and lack proficiency, immediate feedback is key to prevent errors from becoming embedded.

Stage 6: Second Problem

When I first started using model answers, I would demonstrate one and then ask students to write their own. This almost always resulted in indolent or weaker students copying the model without thinking at all about the task. Higher ability students would also explain that the model was a hindrance. It was as if the model had an intrusive and malign anchoring effect: students didn’t want to deviate from it as the implicit assumption was that it exemplified excellence; however, they didn’t want to entirely emulate it as this was clearly just plagiarism.

Here is a sequence from the old, problematic approach:

1) Question

2) Worked Example


4) Student response (plagiarised from or hindered by the model!)

Here is a sequence for the new approach:

1) Vocabulary that applies to both questions (containing multiple worked examples of sentences to be studied)

2) First Question

3) First extract to annotate (acting as a worked example of annotation)

4) Worked Example to study (contains lines vocabulary and lines from stage 1 and 3)

5) Second extract to annotate. (these are different lines to the ones explored in the model)

6) Second Question

This six stage process has been designed in order to avoid the weakness with my earlier approach. The first and second questions may vary in focus and wording, although they may be exactly the same. Because the lines in stage 5 are different, it prevents students from merely copying the model answer. Instead, the model can act in the way that Sweller intends when he explains that ‘Having a worked example to study just prior to solving a similar problem provides the learner with an analogy available while solving the problem’. However, students are still able to apply the sub-components (vocabulary, sentence structures, analytical skills) that they have been explicitly taught.

We have methodically and systematically threaded this approach into all schemes of work from the start of year seven onwards. When students engage in analytical writing, they almost always encounter these double page spreads.

The alternation strategy is a core part of our resources and this table provides a summary of how it works in our booklets:

abNext Post:  ‘The Problem Completion Effect’: An Overview

Applying Cognitive Load Theory part 1: Overview and The Worked Example Effect

This is the first blog post looking at how Cognitive Load Theory can be applied in the classroom.

I first came across Cognitive Load Theory a few years ago via Greg Ashman’s informative and prolific blog. Since then, interest in John Sweller’s work has spiked and it seems to be one of the hottest topics amongst research-informed teachers.  Dylan Wiliam has even gone so far as to say:


While much of the interest and discussion surrounding Sweller’s ideas is centred around the domains of science and maths (this is understandable as a large number of his studies focus on these areas), this series of blogs will explore how I have been attempting to apply his ideas to English, a subject area that Sweller would call an ‘ill-structured learning domain’.

If you would like to know more about Cognitive Load Theory, here are some useful resources:

1) Greg Ashman’s blog has many detailed posts about CLT

2) This succinct and practical summary 

3) Oliver Caviglioli ­ has made a fantastic graphic overview of Cognitive Load Theory by Sweller, Kalyuga and Ayers

Six years ago I read Why Don’t Students Like School by Daniel Willingham, a text that not only made me reconsider almost all aspects of how I was teaching, but also acted as a springboard into the depths of educational research. His explanation of the importance of memory and the conceptual distinction between working and long term memory revolutionised how I thought about instruction and made it abundantly clear that I had not been focussing upon the vital notion of retention. Cognitive Load Theory is also based on the conceptual difference between working and long term memory and provides a number of strategies to optimise instruction within that framework.

All subsequent quotations and references that I use are from Cognitive Load Theory by Sweller, Kalyuga and Ayres.

cogload theory

An Overview of some of the Theory 

What is it that makes experts proficient? In 1973, a study was conducted to investigate what made grandmaster chess players superior to other players. While an intuitive answer may have attributed their dominance to more proficient problem solving abilities, the application of a generic ‘means-ends’ analytical approach or the fact that they weighed up and considered a wider range of alternative strategies, the reality was a difference in their memories. Players, both expert and novice, were shown a chess board with pieces arranged in plausible and typical game situations for 5 seconds. When asked to recall the positions of the chess pieces, expert players were significantly and consistently better than novices. However, if the pieces were arranged randomly, then this gap in performance disappeared: experts and novices performed the same. With the random configurations, experts could not rely upon recalling thousands of game configurations as the pieces did not conform to or fit game patterns that they had stored in long term memory. Similar results have also been found in other domains, including recall of text and algebra. The conclusion of these studies is that when solving problems or engaged in cognitive work, experts within a field rely upon their larger and more developed long term memory deposits, patterns of information that are also called schemata. While short term memory has a limited capacity, long term memory capacity is vast and seemingly endless.

Recognising the fact that novices have less relevant knowledge stored in their long term memory, Sweller et all explain: ‘Novices need to use thinking skills. Experts use knowledge’. Because ‘thinking skills’ rely upon working memory, an aspect of cognition that has a small and fixed capacity for holding and manipulating items, novices soon reach the limits and, due to excessive cognitive load, find tasks difficult or impossible as a result. The implications of these findings are striking for teachers. In a general sense, we should be spending much if not most of our time as teachers trying to increase our students’ domain specific background knowledge so that we can help them overcome the seemingly unalterable capacity in their short term memory and instead recall, apply and use relevant knowledge from their long term memories. Sweller et al posit that ‘we should provide learners with as much relevant information as we are able’ and that ‘assisting learners to obtain needed information during problem solving should be beneficial’. They also posit that ‘Providing them with that information directly and explicitly should be even more beneficial.’ I have written before about how we provide students with relevant vocabulary when responding to texts, as well as how to create focussed practice activities that assist learners when solving problems.

The Worked Example Effect

In short, the worked example effect refers to the idea that if you want novices to succeed in a domain, they would be better studying the solutions to problems rather than attempting to solve them. Asking students to repeatedly write extended answers to questions ‘unnecessarily adds problem-solving search to the interacting elements, thus imposing an extraneous cognitive load’. In the absence of well-developed background knowledge, students flounder because they have little stored in their long term memories to help them. Comments in class such as ‘I don’t know how to start’ and ‘what do I write’ are sometimes indicative of this scenario.

Responding analytically to texts is a complex activity containing multiple components, many of which are abstruse for novice learners. If you try to describe these elements, you are forced to use abstract phrases like sophisticated analysis, judicious use of quotations and, in the absence of examples, these terms merely serve to mystify the process further. This is the language of mark schemes, terminology that may make sense to experts but leaves novices confused. Creating worked examples-in English this may mean sentences, paragraphs or essays-exemplifies these opaque terms, converting the abstract into the concrete.

Sweller et al posit ‘worked examples can efficiently provide us with the problem solving schemas that need to be stored in long-term memory’. Studying worked examples is beneficial because it helps to build and develop students’ background knowledge within their long term memories, information that can then be recalled and applied when attempting problems. The grandmasters in the chess study were successful because of the breadth and depth of their background knowledge. Similarly, English teachers find writing (one of the problems in our domain) easy because we have long term memories that contain myriad ‘problem solving schemas’ and mental representations of analytical responses to texts.

If we accept the notion that short-term memory capacity is pretty much fixed as well as the idea that we cannot really teach generic higher order thinking skills , then building domain specific background knowledge may be our most important job as teachers. Studying worked examples is more effective and efficient than merely attempting problems: deconstructing and studying model sentences, paragraphs and essays should, in the long run, be superior to merely writing them.

Research into The Worked Example Effect in English

In Cognitive Load Theory, Sweller et al refer to English, the humanities and the arts as ‘ill structured learning domains’ to distinguish them from mathematics and science. They make the point that while maths and science problems have ‘clearly specified problem states or problem solving operators’, essentially rules that dictate process and approach, ‘ill structured domains’ do not have such rigid constraints. Although there are subjective elements within English and often innumerable ways of approaching a task, different approaches may be considered of equal worth and demonstrate a comparable level of proficiency. The variables within analytical writing can, like the colours within a painter’s palette, be arranged in numerous and diverse patterns; however, these different configurations can be judged to contain equivalent skill and quality. Despite this, the researchers make the important point that ‘the cognitive architecture…does not distinguish between well-structured and ill-structured problems’ meaning that the findings of Cognitive Load Theory apply to all domains. The researchers also posit ‘the solution variations for ill-structured problems are larger than for well-structured problems but they are not infinite and experts have learned more of the possible variations than novices.’ Over the years, teachers have read, thought about and produced innumerable pieces of analysis and, as a result, have developed rich schemata of this kind of knowledge which we can recall, choose from and apply when dealing with problems.

Sweller et al point out that ‘even though some exposure to worked examples is used in most traditional instructional procedures, worked examples, to be most effective, need to be used much more systematically and consistently to reduce the influence of extraneous problem-solving demands’ A five year curriculum that systematically and consistently uses worked examples should help students build a rich schemata of ‘possible variations’, moving them quicker and more efficiently along the continuum from novice to expert than if they had just completed lots of writing tasks. The constant studying of concrete worked examples is far superior to describing proficiency using abstracted and often vague descriptors and success criteria. When describing complex performance in the absence of concrete examples, which is the purpose of a mark scheme, the sheer breadth and possible variation of what is being described necessitates a wide lens of representation. While this is advantageous to the expert, allowing complexity to be summarised and condensed, it is obfuscatory and perhaps even meaningless for students. Experts have detailed and abundant schemata that exemplify abstract terms like critical analysis, judicious references, contextual factors; novices do not.

In Cognitive Load Theory, two studies directly relevant to English are referenced. In the first (Oksa, Kalyuga and  Chandler 2010), students were given extracts from Shakespearean plays, half receiving texts with accompanying explanatory notes, the other half receiving no additional notes. Perhaps unsurprisingly, the group who were given the notes performed better on a comprehension task. In another study cited in the book (Kyun, Kalyuga and Sweller), students were given an essay question to answer. One group received model answers to study, the other did not. The study found that ‘the worked example group performed significantly better than the conventional problem-solving group’.

What does this look like in English?

If we want students to perform well in complex tasks like writing, we should be giving them the necessary information ‘directly and explicitly’. Echoing Engelmann’s sentiment that we should teach everything students will need, Sweller’s et al’s work also points to the superiority of explicit, direct instruction, approaches that seem more efficient and effective for novice learners. With regards to English, we should be explicitly teaching sentence structures and vocabulary. We should provide this information to students when they are completing extended writing and one way of doing this is through vocabulary tables that contain definitions and examples. Not just examples of how the vocabulary words are used, but also examples of the sentence styles that students should include. Each of these example sentences is a worked example in itself and, with effective teacher questioning and annotation, can be a powerful way of turning abstract and amorphous success criteria (use sophisticated sentences/use a range of complex sentences etc) into concrete examples that the learner can ‘study and emulate’. Here is a section of a vocabulary table for London, one of the poems from the GCSE poetry vocab table

To minimise cognitive load, students have these tables when they are annotating the poem, allowing them to make the link between text and interpretation.

Although Cognitive Load Theory contains a number of different effects, the worked example effect is described by the researchers as being ‘the most important’ and, because of this importance, we have incorporated it into all stages and aspects of our curriculum. Almost always, when students are asked to write, they will have studied a related and relevant worked example.

Next post: The Alternation Strategy: How example problem pairs can work in English.

Insights from DI part 9: The Sequencing of Skills

This is the ninth post looking at how ideas from Engelmann’s DI can be applied to the everyday classroom. The first eight can be found here: one, two, three, four, five, six, seven, eight

Like the last post, this one will primarily examine The Components of Direct Instruction by Cathy L. Watkins and Timothy A. Slocum, an article from The Journal Of Direct Instruction and an extract from Introduction to Direct Instruction. The paper can be found here:

In the last few posts, I have explored a number of factors that determine whether or not an instructional sequence is effective. Here is a quick recap:

a) We should choose to teach high utility concepts that have wide applications, ensuring that students can ‘exhibit generalised performance to the widest possible range of examples and situations.’

b) Teaching through examples and non-examples may be more efficient than relying upon lengthy, abstract and confusing explanations. Examples should be rigorously chosen and sequenced to maximise clarity and efficacy.

c) Instructional formats should be suited to the level of proficiency of the students. Earlier on, they should be overt and split into logical, sequential steps, allowing teachers to give precise feedback and error diagnosis. Later on, the steps should be removed, encouraging more independent application.

d) Linked heavily to the last point, practice activities and tasks should broadly move along six different continuums as students develop in proficiency.

The Sequencing of Skills

One of the core premises of Direct Instruction is that ‘students should be well prepared for each step of the program to maintain a high rate of success’. On new material, students should be at least 70% successful; on material that is being firmed and practiced, they should be 90% successful. In order to achieve these staggeringly high and impressive success rates, sequences of learning should be systematically ordered according to four main guidelines.

1) Prerequisite skills for a strategy should be taught before the strategy itself.

In a previous post I explored the idea that DI schemes teach ‘everything students will need for later applications’, and this is almost always achieved by teaching the individual, component parts of a more complex skill. The most obvious example of this idea is decoding. If a student cannot decode properly-and it is unfortunate that some still leave primary school in this position-then all other higher order skills are unattainable. You cannot understand the meaning of a text if you cannot convert graphemes into phonemes. You cannot analyse language if you cannot decode it. In fact, if you cannot decode properly, then it is very hard to learn anything at all. Although a few years ago, I would perhaps have accepted that some people will not ever learn to read-ignorance allows the mind to form all kinds of excusatory justifications- I have since learnt that almost everyone can learn to decode, given the right instruction. At our school, we have 6 teachers, including myself, who are trained to deliver Thinking Reading, a systematic and highly effective reading intervention. Our first student graduated from the programme a month or so ago: she began the course decoding at the age of a nine year old, and, after six months, she is now decoding at the age of a fifteen year old, making an average rate of progress of 1 year for every three hours of instruction. If you have students who are weak decoders, then I would highly recommend Thinking Reading!

I wrote here about the importance of teaching the different components that make up a complex task before students actually attempt the more difficult application. Here is another example:

components before whole

The underlined parts of the sentences are noun appositives, constructions that rename nouns within a sentence. The bullet pointed list contains some (not all: you can break this down into far more components!) of the ‘prerequisite skills’ that would need to be taught before students can successfully attempt to create appositives. Kris Boulton wrote a series of blogs about how he applied Engelmann’s ideas to teaching simultaneous equations; this one explores the thirteen sub-components that he decided needed explicitly teaching before students attempted the entire equation.

2) Instances consistent with a strategy should be taught before exceptions to that strategy

According to the article, ‘Students learn a strategy best when they do not have to deal with exceptions’. When learning something new, exceptions will confuse students and impede their understanding, particularly if the learning is centred around a rule or some form of ‘If-then’ statement. However, ‘once students have mastered the basic strategy, they should be introduced to exceptions. For example, when the VCe rule is first introduced, students apply the rule to many examples (e.g. note) and nonexamples (e.g. not). Only when they are proficient with these kinds of words will they be introduced to exception words (e.g. done)’

Here is an example about eusociality.

incident exceptionMost eusocial species are insects, many of whom belong to the Hymenoptera order. If you were teaching this concept and following Engelmann’s sequencing rule, it may make sense to deal with this large set first, waiting until students have mastered it before introducing mammals and crustacean examples.

 In Theory of Instruction, Engelmann presents detailed instructions about how to deal with large sets of items, specifically sets that contain subsets that should be split from the main group and taught separately to avoid confusion.

3) Easy skills should be taught before more difficult ones.

There seems to be a position amongst teachers that allowing students to struggle is a good and desirable thing. Lots of people seem to be talking about Grit at the moment, valorising the idea of resilience in the face of difficulty. Some teachers equate hard work with learning, seeing a strong and fixed line of causation running between them. The problem with this idea is that the success of a program of study is predicated on the tenacity and perseverance of the student instead of the quality of the curriculum or the teacher. While persistence and stoicism are admirable traits that we should praise and encourage, relying upon them for a student’s success seems risky: if the student does not have these characteristics, what then? When I began teaching, I foolishly and ignorantly believed that if I raised the level of challenge, students would magically respond with increased determination and effort, resulting in improved attainment and higher levels of proficiency. Unsurprisingly, this did not work. All I was doing was increasing their feelings of inadequacy and highlighting their lack of understanding whilst doing nothing at all to help them improve. If students begin with easier tasks, they are far more likely to succeed; as the article points out ‘The experience of success is one of the most important bases of motivation in the classroom.’

Here is a possible overview of KS3 poetry teaching, following the ideas that easier skills should be taught first.

Year 7: Teach techniques, sentence level analysis and single paragraph responses.

  • Focusing on the sub-components of analysis and getting students to master them in isolation is probably more effective than continually practising the production of multi-paragraph analysis. Not only is feedback easier to give when skills are drilled, but students experience success which is the ultimate form of motivation.

Year 8: Begin to teach comparative structures; multiple paragraph responses.

  • Comparing poems is hard: synthesising information from two texts is more complex than dealing with just one. Like in year 7, we begin by drilling comparative structures in isolation, building up to writing comparative multi-paragraph responses.

Year 9: How to approach unseen

  • We deliberately leave unseen until year 9 as success here depends heavily upon a student’s vocabulary and background knowledge. Although we teach an approach to unseen in year 9, they still experience the majority of the poems that they encounter through teacher led, explicit instruction. Once students have mastered the approach to unseen, there are diminishing returns to practising lots of unseen poems. Assuming students have mastered the generic approach to unseen poems, if students are attempting multiple unseen tasks without support, what are they learning? Could this time be spent teaching more complex poems and the rich vocabulary that would be needed when responding to them?

GCSE: Consolidation of KS3; analytical essay introductions; development of entire essay responses.

  • The idea here is that students enter KS4 having mastered everything that they will need for GCSE, allowing these two years to be used for honing whole essay responses.

Crucially, this progression model is cumulative and all of the easier skills that are taught in the earlier years are encountered, used and applied throughout all subsequent poetry sequences. Students are taught sibilance in year 7 in a unit on Poetry from other Cultures. They will use it again with Romantic poetry, Civil Rights poetry and Dystopian poetry in year 8. In year 9, they will need it when looking at War poetry, Victorian poetry and Shakespearean Sonnets. For GCSE, they will need it again when analysing their anthologies.

4) Keep confusing things separate.

If things are incredibly similar, then we should not introduce them at the same time. I am currently using ‘Teach Your Child to Read in 100 Easy Lessons’ to teach my daughter to read. The ‘d’ sound is taught in lesson 12 and the course waits until lesson 54 to teach the ‘b’ sound because of the huge potential for confusion. Both are voiced consonant sounds and the symbols are exactly the same except for their orientation on the page, reasons why confusion between these two is common amongst weak readers.

Participle phrases and some absolute phrases are very similar and as a result, they should be separated to minimise confusion. As absolute phrases are the furthest away from functional, everyday language, I would probably teach them last, beginning instead with participle phrases.

Present Participle phrase: Raising her voice, she seemed to be getting angry.

Absolute phrase: Her voice raising, she seemed to be getting angry.

Both constructions not only contain participles, but they use exactly the same words, demonstrating just how close these two phrases are to each other.

Whatever subject you teach, there will be numerous examples of pairs or groups of concepts that students confuse, perhaps because of their function, perhaps because of their spelling or maybe because they have similar definitions. If these points of confusion are already apparent, then thinking deliberately about when they are taught and separating them from each other may go some way towards preventing this confusion.

Next post: Cognitive Load Theory-The Worked Example Effect.




Insights from DI part 8: The 6 Shifts of Task Design

This is the eighth post looking at how ideas from Engelmann’s DI can be applied to the everyday classroom. The first seven can be found here: one, two, three, four, five, six, seven.

Like the last post, this one will primarily examine The Components of Direct Instruction by Cathy L. Watkins and Timothy A. Slocum, an article from The Journal Of Direct Instruction and an extract from Introduction to Direct Instruction. The paper can be found here.

The 6 ‘shifts’ of task design

The last post looked at idea that ‘the support that is so important during initial instruction must be gradually reduced until students are using the skill independently, with no teacher assistance.’ According to the article, ‘Becker and Carnine (1980) described six ‘shifts’ that should occur in any well-designed teaching program to facilitate this transition’.

1) The shift from ‘overtised’ to ‘covertised’ problem-solving strategies.

Initial instruction will break down a concept or strategy into multiple individual steps (see ‘Format 1’ in this post). In Theory of Instruction, Engelmann explains the difference between physical and cognitive operations. A physical operation includes things like ‘fitting jigsaw puzzles together, throwing a ball, ‘nesting’ cups together, swimming, buttoning a coat’-essentially any process that involves a series of steps that may include motion or the manipulation of matter, and one that receives immediate ‘feedback’ from the environment. If you are trying to hammer a nail and not doing it properly, the environment will give you ‘feedback’ and prevent you from completing the physical operation. Perhaps you missed the nail. Perhaps you didn’t use a hammer. Maybe you didn’t hit the nail hard enough. Perhaps you used the wrong striking technique. The important idea is that it is easy for an instructor to observe the reason behind your failure because the entire process is overt and each step is observable. For cognitive operations, ‘there are no necessary overt behaviors to account for the outcome that is achieved’. Essentially, we do not know how an outcome has been reached unless all the individual steps are made overt. A student may have got lucky; they may have relied upon a rule that, while being effective in this instance, may end up causing problems as a sequence become more complex. Unlike with physical operations like hitting a nail with a hammer, the physical environment does not provide feedback for cognitive operations. This passage from Theory of Instruction explains this idea further:

‘The physical environment does not provide feedback when the learner is engaged in cognitive operations. If the learner misreads a word, the physical environment does nothing. It does not prevent the learner from saying the wrong word. It does not produce an unpleasant consequence. The learner could look at the word form and call it ‘Yesterday’ without receiving any response from the physical environment. The basic properties of cognitive operations-from long division to inferential reading-suggest both that the naïve learner cannot consistently benefit from unguided practice or from unguided discovery of cognitive operations. Unless the learner is provided with some logical basis for figuring out possible inconsistencies (which is usually not available to the naïve learner), practicing the skills without human feedback is likely to promote mistakes.’

Making the steps of a cognitive operation overt allows extremely precise feedback to be given as the instructor can easily see the reason behind a particular outcome. During the acquisition phase of learning-the initial stage where, in the absence of teacher support, a student would soon become confused-precise and immediate feedback is of crucial importance, preventing errors from becoming embedded and ensuring accuracy is achieved. If students engage in ‘unguided practice or…unguided discovery’ then they will flounder; this is the central idea behind the influential Kirschner, Sweller and Clark paper which critiques one of the popular premises behind progressive education.

How does this apply to English?

 1) In this post, I demonstrated an initial (flawed) communication sequence for teaching students to select and punctuate quotations, exemplifying how the covert can be made overt.

2) Precise feedback is vital: try making restrictive practice activities that isolate and focus on the specific content that is being taught: this post explains one way of doing these.

3) When students write sentences, ask them to feedback orally by saying/narrating the punctuation, turning the covert into the overt and allowing you to quickly and efficiently ascertain if they have succeeded without having to actually mark or look at their book. If appropriate, other students can listen and give feedback about accuracy too!


Teacher: Read out your absolute phrase sentence, narrating the punctuation

Student: His paranoia growing COMMA Macbeth seems increasingly unstable and unhinged FULL STOP.

4) Live modelling: creating example sentences, paragraphs or even essays under the visualiser whilst narrating your thought process is a really powerful way of letting students observe the process of writing. When this process is made interactive with lots of teacher-student questioning, it can be even more effective.

2) The shift from simplified contexts to complex contexts.

simple to complex

In sport as much as in teaching, drills deliberately isolate elements to practice, increasing the amount of practice and decreasing cognitive load. Drills allow students to focus on ‘critical new learning’.

In Expressive Writing 2, one of the DI corrective writing schemes, one of the key skills that is taught is punctuating direct speech. Here is an overview of the ‘track’ (series of lessons) where this skill is taught and applied: this track spans 48 lessons! (See this post which explains the idea of a strand curriculum where multiple ‘tracks’ are intertwined over time).

1) Lesson 2,3,4: The rules for punctuating direct speech are introduced. Students are heavily directed by teachers and, following detailed and precise modelling of examples, have to complete isolated sentences of punctuated speech.

2) Lessons 5 to 9: Students write their own simple sentences that contain direct speech with minimal prompts. Sometimes they are statements; sometimes they are questions.

3) Lesson 10,11,12: Students edit and correct individual sentences that contain direct speech, adding in missing punctuation marks and capital letters.

4) Lesson 13 and 14: Students edit sentences in a paragraph that includes direct speech, adding in missing punctuation marks and capital letters.

5) Lesson 16 and 17: Students punctuate direct speech that includes two consecutive sentences.

6) Lesson 18: Students edit and correct an entire passage with two-sentence direct speech.

7) Lesson 24 to 28: Students punctuate direct speech that appears at the start of a sentence.

8) Lesson 33 to 37: Students punctuate sentences that include two different pieces of direct speech that are separated.

The practice activities gradually build in complexity in two ways: firstly, the subject matter and what is being taught slowly becomes more challenging; secondly, the context of the practice changes from isolated, supported drills to increasingly complex and contextualised activities. As the scheme progresses, students are increasingly asked to apply these component skills within a freer piece of writing, something that almost all of the lessons end with.

How does this apply to English?

Following this ‘shift’, I wrote here about how skills and items that are taught can be moved from simplified to more complex contexts.

3) The shift from prompted to unprompted formats


According to the article, ‘In the early stages of instruction, formats include prompts to help focus students’ attention on important aspects of the item and to increase their success. These prompts are later systematically removed as students gain a skill. By the end of the instruction, students apply the skill without any prompts.’

How does this apply to English?

1) In the earlier stages of instruction, worked examples should be labelled clearly, identifying relevant bits and making clear to students which parts are important.

Here is an example:

6 skills

2) When teaching sentence styles, arrows and labelling can help make the implicit interactions and relationships between different components obvious to students.

Here is an example:

prompt grammar

4) The shift from massed practice to distributed practice

According to the article ‘Initially, students learn a new skill best when they have many practice opportunities in a short period of time. In later learning, retention is enhanced by practice opportunities conducted over a long period of time.’ During the acquisition stage of learning, it may be helpful to have multiple practice opportunities in order for students to become proficient with a concept. As the sequence progresses, this practice should become increasingly more distributed.

How does this apply to English?

1) This post looks at an example lesson plan, exemplifying how concepts are initially taught via massed practice and then move to distributed practice, through regular and systematic retrieval practice.

2) This post looks at the journey of a test item across multiple lesson, again moving from massed to distributed practice.

5) The shift from immediate feedback to delayed feedback

immediate delayed

At the beginning of an instructional sequence, feedback should be immediate and precise, preventing errors from becoming embedded. Although feedback is lauded as a universally positive thing-the implication being that the more you give, the better students will perform and learn-this notion is overly simplistic and erroneous. The EEF point out that ‘Feedback studies tend to show very high effects on learning. However, it also has a very high range of effects and some studies show that feedback can have negative effects and make things worse.’ Referencing Soderstrom and Bjork’s ‘Learning versus performance’ paper (accessible here), David Didau points out that ‘there is empirical evidence that “delaying, reducing, and summarizing feedback can be better for long-term learning than providing immediate, trial-by-trial feedback.” This last point seems to corroborate Engelmann’s idea that the optimum type of feedback will change according to the stage of instruction.

How does this apply to English?

1) Initial practice activities of all concepts (analytical skill, vocabulary, context knowledge, sentence styles, punctuation etc) should be through restrictive and isolated drills, allowing precise and immediate feedback to be given. A teacher can either circulate, giving verbal feedback, or write a model and, by displaying it under the visualiser, provide students with an answer with which to check their own efforts against.

2) Regular cumulative recap quizzes at the start of lessons provide the perfect opportunity for immediate feedback regarding spelling and conceptual understanding. Referring to the ‘hypercorrection effect’, the idea that ‘The more confident someone is that an incorrect answer is correct, the more likely they are not to repeat the error if they are corrected’, Dylan Wiliam explains that ‘The benefits of testing come from the retrieval practice that students get when they take the test, and the hypercorrection effect when they find out answers they thought were correct were in fact incorrect. In other words, the best person to mark a test is the person who just took it.’ Following this advice, we conduct retrieval practice under the visualiser, filling in the answers immediately after the quiz, and asking students to check and correct their own work.

6) The shift from an emphasis on the teacher’s role as a source of information to an emphasis on the learner’s role as a source of information

This shift matches the idea of ‘I-we-you’, where responsibility and agency gradually moves from teacher to student. This table from p.72 of Teach Like a Champion, illustrates this shift:

i we you

How does this apply to English?

Let’s take an example from writing paragraphs:

1) The teacher writes a paragraph under the visualiser, narrating thought process and decisions   (The ‘I’ stage)

Using a semantic field of light, Dickens describes Scrooge’s room with words like ‘bright…gleaming…glistened’, symbolising warmth, comfort and opulence. Perhaps the ghost wants Scrooge to experience a convivial and celebratory scene so that Scrooge will not only realise that being as ‘solitary as an oyster’ is a bad choice, but that he could spend his money and enjoy himself instead.

  • If students have been already acquired the analytical skills, you could ask them to identify them (3 part explanation/evidence in explanation/tentative language/multiple interpretations)
  • Quick fire questioning about vocabulary (convivial/opulence/semantic field will provide valuable retrieval practice.


2) Teacher begins a second paragraph under the visualiser, asking students to help them complete it. (The ‘We’ stage)

Using a hyperbolic metaphor, Dickens describes the food as ‘…..

  • If you use the same structure as the first paragraph, students have a framework to follow (taking advantage of the alternation effect from Cognitive Load theory)
  • The teacher can prompt students to use taught vocabulary or specific sentence styles
  • When complete, you could undertake another round of quick fire questioning, again providing valuable retrieval practice.

3) Students now have 2 models to use as analogies. They should then write their own paragraph using different evidence but following the same process and structure. (The ‘You’ Stage)


Next post: Insights from DI part 9-The Sequencing of Skills. What principles should guide how we order instruction?