Assessment and the black swan problem, by Terry Freedman

Although it’s perfectly understandable why politicians in the UK allowed teacher assessment to replace formal examinations, it’s not an ideal solution. In my experience of dealing with teacher assessment in the areas of ICT and Computing, the following issues surfaced.

Lack of agreement about the criteria

High stakes exams like ‘A’ Levels come with marking schemes and rubrics. Having a detailed scheme or rubric containing detailed statements sounds like a good idea. However, different people interpret them differently. I go into this in more detail in The Trouble With Rubrics, but just to give you an example. Suppose a statement in the rubric is:

Has tested the program with end users and made adjustments according to their feedback.

Certain questions arise immediately:

What constitutes “testing”? For example, did it involve asking people to complete a questionnaire, or use the program to solve a problem, or what?
How many users are enough?
Who were the users? Friends, family, people selected at random?
What sort of adjustments? Are all adjustments equal?

It’s obvious that different teachers are going to interpret that statement differently. It’s perhaps less obvious, but true, that the same teacher is likely to interpret it differently according to factors like time of day, how tired they are, whether they’ve just eaten.

This sort of thing is ironed out at moderation meetings if the assessment has been done through a formal examination process. At some point, after discussion or dealing with issues arising on an ad hoc basis, a decision is made at a high level so that no individual marker has to interpret the statement. All they have to do is check whether the student has satisfied the agreed interpretation of the criteria.

I discovered while working as a Principal Officer in ICT at the Qualifications and Curriculum Authority (QCA) that teachers continue to interpret criteria differently from each other no matter how much guidance is provided. We had detailed statements for each Level (as they were then), and examples of pupils’ work at each Level, along with a commentary on why and how the criteria had been met, and teachers still interpreted the criteria in their own way. That wasn’t because they didn’t know what they were doing or anything like that, but simply that there was not the opportunity for lots of discussion and therefore, hopefully, agreement about what the criteria meant and what constituted good evidence. Whenever I was able to engage in such discussions with groups of teachers, everything was fine.

Teacher assessment methods differ from each other

When I was a Head of Computing in the late 1990s, the local authority collected in the teachers’ grades for ICT at the end of Key Stage 3. For non-Brits that means at the end of the first three years of secondary (high) school.

It was a ridiculous exercise because each schools Head of ICT/Computing l had arrived at their grades in a different way. For instance:

I gave the students a project to do over 6 weeks, and based my grades on how well they did in that.
Another school set the students a one hour exam.
Another Head of Department graded the students according to how they had performed over the course of the term.

Teachers’ assessments tend to err on the high side

I think this arises for the following reasons:

A predilection for giving their students the benefit of the doubt if they’re not sure.
The pressure to demonstrate that students have made progress. Given the pressure on teachers and schools to show constant progress (or, as Tony Blair seemed to imply, to make sure that all secondary schools are above average), there is a perverse incentive to err on the high side when it comes to assessing your students’ attainment.

Reports from schools and local authorities may not be worth the paper they’re written on

I’m fairly sceptical of reports stating what wonderful progress has been made when their statistics are based on teacher-assessed grades, for the reasons stated, and also because of a couple of glimpses behind the curtain.

Two days after I started work as an ICT advisor in a local authority, my line manager’s line manager came into the office and asked me if I could analyse the Key Stage 3 ICT grades and write a progress report.

“I’ve looked at the grades, and you can’t compare like with like.”, I replied. “So they’re basically a load of rubbish.”

“Well, if you could write a report on progress made, that would be great.”

“There hasn’t been any progress, and the results aren’t worth the paper they’re written on.”

“Now, Terry”, he said, placing a hand on my shoulder. “That could be construed as a career-limiting statement. I’m sure you could could find evidence of progress if you looked hard enough.”

“One school’s grades has risen by 0.5% over last year’s results. All the others have gone down.”

“There you are! Significant progress has been made.”

The other glimpse behind the curtain was when it became evident that some school’s processes were, shall we say, not working as well as they might. I received a phone call from an irate headteacher demanding to know why I had stated that their average mark for ICT was 18%.

“I stated it because that’s what is on the information I have received.”
”Where did that information come from?”

“You.”

“Me?”

“Yes, it’s a report from your school, signed by you, so I assumed you were aware of what was written in the report.”

Concluding remarks

The exam system is by no means foolproof, but in my opinion formal exams tend to be more valid and reliable than the alternatives. Teacher assessment is fine for formative assessment purposes, but when it comes to high stakes assessment it’s not fit for purpose.

If you found this article interesting and useful, why not subscribe to my newsletter, Digital Education? Subscribers get free access to an essay I wrote entitled Assessment and the Black Swan Problem. The newsletter has been going since the year 2000, and has news, views and reviews for Computing and ed tech teachers.