Wednesday 26th June 2019, 12.30pm to 2.00pm
Speakers: Scott Slorach and Matt Wingfield
Location: Room H/G/21, Heslington Hall, Campus West
“My friend did the same and got a better mark.” Can Adaptive Comparative Judgment achieve more consistency in assessment than traditional academic judgment?
Consistency and equity are two of the University’s core principles on which assessment should be based. Marker meetings and guides, double-marking, and internal and external moderation are designed to assure those principles. However, students still raise issues of consistency of marking standards, whilst the academic judgment underlying a mark cannot be questioned.
Adaptive comparative judgment (ACJ) is an alternative assessment method. Rather than individual (or double) marking, a larger group of academics is presented, through an online system, with pairs of student scripts. Individually, they exercise comparative judgment – simply, which script is the better – against an agreed assessment statement. The system pairs scripts for comparison, initially at random, then systematically, based on an emerging ranking order that progressively identifies more reliably, it is claimed, a student’s position within their cohort.
In this workshop we shall consider in more detail: the theory underpinning ACJ; its claimed reliability and ongoing consistency as against traditional marking; the intellectual and organisational processes involved; and the types of assessment potentially most suited to exercising ACJ. We shall will draw upon a comparative exercise undertaken at York Law School, where traditionally assessed formative assessments were re-marked using ACJ, raising interesting issues around both the results provided and the tutor experience. We shall also discuss student reaction to ACJ, and its additional potential for use by students.
This session was presented by Professor Scott Slorach, Director of Learning & Teaching at York Law School, and by Matt Wingfield, Business Development Consultant from RM Results. Matt and Scott shared their recent experiences trialling in the Law School in which Adaptive Comparative Judgement was trialed.
Adaptive Comparative Judgement (ACJ) is a method of assessment marking that comes from the fact that humans do not react consistently to stimulus, and that this has significant impact on the reliability of assessment. Developed by Louis Thurstone in 1927, Thurstone observed that if you give someone an object and ask them to estimate its weight, the resulting assessment tends to be of poor quality and repeatability. However, if you instead give people two objects and simply ask which is heavier, the results are much more accurate. You can then continue with these pairwise assessments and build up an accurate ordering of a set of objects far better than if you attempted to estimate the weight of each item individually. Thurstone demonstrated his law by asking subjects to rank the ‘scariness’ of a set of images. Despite fear being a very personal and subjective characteristic, the resulting ordering was robust to repetition, including repetition with different assessors.
Given this, ACJ was trialled by the Law School for marking a set of first year undergraduate essays. The ACJ assessment was supported by Matt and the CompareAssess software from RM Results. CompareAssess presents pairs of essays to the marker and simply asks which is better. No other qualification or justification is requested, only which is the better essay. During the assessment round, the software keeps track of its certainty about each essay and will dynamically present essays to markers more often until it is confident in the ranking of each script. At any point the marking team can get statistics on the certainty of the entire assessment and therefore determine how many marking rounds they might have to complete. During the workshop we were able to try out the software by making judgements on a mock English exam.
Scott discussed his experiences of how doing the ACJ assessment compared to that of a more standard marking approach. Each individual judgement is relatively fast; on average just over 5 minutes. However a much larger number of judgements is requested of the assessor. In total, Scott found that the total time taken per marker was still slightly less than the normal time allocated for marking such a cohort, but an overall assessment accuracy of 92% was achieved, which is higher than they generally observe. This is very encouraging as it shows that accuracy can be significantly improved without a corresponding increase in time taken. Other advantages are given by the ACJ approach as well. The software automatically tracks the performance of each marker and can be used afterwards to give feedback to markers if they significantly different to the rest of the marking team.
However there are downsides to such an approach. Firstly, many of the benefits of the system come from the fact that a teaching team (in this case 5 lecturers) is working on the cohort. A first year module was chosen for this reason. Quite often, particularly in the sciences, it can be hard to find five academics capable of marking a third year specialist optional module. If only one assessor performed the approach it would still work, but this format was not assessed. Also, there are still a number of questions that need to be determined. Student feedback has to be correlated after the process, so this has to be accounted for. Also, the output of ACJ is an absolute ranking of the essays. This has to then be mapped onto a range of grades, and there are many different approaches to this.