In defence of algorithms

Submitted by martin on 27 August, 2020 - 9:29 Author: Martin Thomas
algorithms

As the editorial in Solidarity 560 said, the "condemnation of 'algorithms' [around the exam grades fiasco] was unfair on algorithms. If algorithms do not produce reasonable results, it is because humans have messed up".

The point is worth a few further words, because hostility to "algorithms" in general can become obscurantist technophobia.

An algorithm is a precisely-defined list of instructions, to do a calculation, or for that matter e.g. to bake a cake (in which case it is called a recipe). No more, no less.

Algorithms can be badly done, as the ones for grades this year were. They can be used to obfuscate and evade responsibility, or to hide prejudices which shaped the algorithm in a fog of technicalities - especially so if they are kept under wraps, as the grades algorithms were.

But it would be bad, not good, for student gradings (for example) to be decided by personal judgements rather than by objective rules. In fact, one of the advantages of algorithms is that they can be made public, and those concerned about the calculation can know in advance what they're dealing with, rather than e.g. just be told by the adjudicator "I felt you were a C rather than A. When you get to be as experienced as me, you can judge such things".

Deciding grades by exams is also an algorithm. In England, you add up the marks, compare them to a list of cut-offs, and allot A, B, C, etc. accordingly. The mark scheme may be poor or ill-defined, but at least there is none of "I felt this 80% was not as good as the other 80%, so it doesn't deserve a B".

There could be other algorithms. For example, in another school system where I've worked, maths papers marks are reported on three distinct dimensions. A high grade depends on getting good scores in one particular dimension ("modelling and problem-solving") as well as the total mark, but a pass doesn't.

For another example, in some systems students are graded on the total of their better results, rather than on the overall total. If they have had a minority of "epic failures", those can be thrown away.

For yet another example, in some systems maths exams are set so that even the most successful students may get 60% or 70%, rather than 90%-plus being required for an A, as in English A-levels (so that just one or two nervous fumbles from exam stress will bar you from top grade).

And still another: in some systems it is a rule that students must be given back their test papers and be able to query and challenge the marking before the marks are finalised.

I think all those make for better algorithms. But the point is that they are explicit and checkable rules and procedures, not just matters of an individual reporting their feelings.

One of the virtues of defining an algorithm is that it brings out in public, for debate, how assessments are made. And no algorithm at all - just impressionistic personal judgement - is definitely bad. It is like allocating jobs only on the impression made in unstructured interviews, a method known to maximise the impact of the interviewers' (maybe inadvertent) biases, and to disadvantage the awkward, the nervous, the atypical.

It is no better than saying that recipes should be banned and all cakes should be baked on the basis of what feels good at the time.

Comments

Submitted by Pat Yarker (not verified) on Mon, 31/08/2020 - 16:25

Martin writes: 'But it would be bad, not good, for student gradings (for example) to be decided by personal judgements rather than by objective rules.' In relation to high stakes summative exams I disagree, though I think two separate issues may have become conflated here.

There have been attempts to eliminate 'personal judgement' from public summative assessment in England. For example, in 2006 the mark-scheme for the Key Stage 2 Writing SAT boiled down to a tick-box list of technical features: a mark for using a semi-colon or for deploying the subjunctive. Deliberately, the mark-scheme had nothing to say about the degree of imagination your writing showed or the level of interest it evinced in the reader. These central features of a piece of creative writing were left unaddressed. Teachers were quite properly outraged, for a qualitative or interpretative dimension is necessarily part and parcel of summative assessment in many curriculum areas. Coming to a reasonable determination in such cases indeed depends importantly on experience, provided that experience is reflected on, informed by the views of fellow-practitioners (for example through processes of moderation) and weighed up in the light of advice by those who can justly claim authority (senior examiners, for example, who have engaged intensely over long experience and have reflected much). That is why departments in secondary schools will undertake quite extensive processes of trial-marking, internal moderation, ( a form of triangulation of judgement), assessor 'training' by senior examiners and so on. Practitioners will also reflect on the official reports written on exam papers and individual answers in order to refine their approach in the future. To present exam assessment as done on a whim or superficially is a caricature.

Martin's apparent rejection of the value of considered appraisal founded on experience in matters of assessment, and of the importance of informed individual judgement rather than marking-by-numbers, risks aligning him with those who clamour for an increase in machine-marking of exam-scripts, and hence in the further narrowing-down of exam-questions (and hence pedagogic tactics) in order to suit 'algorithmic' approaches to marking. In reality, this means more reductive multiple-choice Q&A of the kind prevalent in the USA.

Martin is right to argue that transparency is vital in public assessment. Students and teachers need to know what the rules are, how the whole process works, the criteria on which judgement will be based and by which grade or mark will be arrived at, and what are the grounds for appeal. That is, everyone needs to know the objective rules of the game, that these are as fair as may be and that they can be appealed to and revised if required. But it is not possible to eliminate the human element in assessing exam material in subjects such as English Literature, Drama, Art or History while still retaining the subject's integrity and offering students an exam-course it is educationally valuable to undertake. A mark scheme is an algorithm in the sense that it is 'a precisely defined set of instructions'. But it is not possible to anticipate precisely, and hence to define, what a student may present by way of an answer to such questions as 'How successful was the 1945 Labour Government in introducing the Welfare State?' or 'Make an artwork entitled 'Self Portrait'' or 'Who or what do you blame for the deaths of Romeo and Juliet, and why?' Nor is it possible definitively to state everything which should or should not be rewarded in a response to such questions. There must be room for judgement and evaluation which is rationally founded and which is open to challenge. Better, I think, to separate out the qualitative/interpretative aspects of assessment in those subjects where because of their nature it is evidently necessary, from the valuable role an algorithm may play in clarifying the steps in the overall process and the general rules of the game. Assessment criteria do not necessarily have to be algorithmic in order for evidently fair and reasonable (though of course not 'objectively certain') evaluation to be made.

Submitted by martin on Mon, 07/09/2020 - 14:02

According to recent reports, when maths A level papers are re-marked, only 4% end up with a changed grade. The assessment is bad, in my view, but at least in a consistent way which, because it is algorithmic, is open to discussion and review.

In subjects like English Literature, Drama, Art or History, the figure is 40-odd per cent. I'd say the answer is just not to have school exams (or, probably, university exams) in those subjects.

"Diagnostic testing" in schools is useful. That can be reported to the student as the teacher's judgement, subject to being queried by the student and with the option of being checked by another teacher's. It doesn't have to be a grade. (When I was teaching in Queensland, Australia, we were, for a while anyway, banned from giving our assessments false "objectivity" with a numerical grade, even in maths).

"Summative testing" is important to check whether people are qualified to do jobs, as doctors or electricians or bus drivers or such. It should be as algorithmic as possible, pass/ fail, and open for trying again if you have a bad day.

No good purpose is served by exams in schools. At the best (and most of them are far from the best) they test ability to do... the sort of exams that are set, which have very little correlation with anything in "real life".

To decide which of two novels to read (or deciding as a publisher which of two scripts to publish, or deciding as organiser of a poetry reading which of two versifiers to invite), would you go by a report that author X had got an A in an Eng Lit exam, and author Y only a B? You would not, and you should not.

This website uses cookies, you can find out more and set your preferences here.
By continuing to use this website, you agree to our Privacy Policy and Terms & Conditions.