What Gen AI Exposes About Written Assessment

brown desk and chair lot

Current debates in higher education about generative artificial intelligence (GenAI) in assessment are frequently framed around the search for a technical mechanism capable of detecting, attributing, or policing student writing.

This framing assumes that the central problem confronting institutions is the absence of sufficiently capable technology, rather than the compatibility of existing assessment practices with contemporary writing conditions.

The rapid uptake of so-called “AI detection” tools, proposals for watermarking GenAI outputs, assessment process-tracking and post-hoc verification software reflect this assumption, despite its lack of evidentiary foundation. These responses do not address a newly created problem but instead recast a long-standing challenge in assessment—the limits of unsupervised written work as evidence of learning—as a technical deficit to be resolved through surveillance.

Although questions about what unsupervised written assessment can legitimately demonstrate have been raised for decades, the introduction of GenAI has removed the plausible deniability that previously insulated institutions from confronting these issues. Historically, uncertainty about the extent, visibility, or impact of assistance was treated as marginal or exceptional, allowing confidence in unsupervised written assessment to be maintained without systematic scrutiny.

The widespread availability of GenAI has removed those conditions by rendering assistance in unsupervised tasks routine, opaque, and impossible to delimit, forcing institutions to respond to questions of evidentiary validity that had long been deferred.

These conditions mean that institutional responses to GenAI that focus on attribution are a systematic misdiagnosis.

The fundamental issue is that written work produced under unsupervised conditions has, in isolation, never provided a defensible basis for the kinds of claims institutions wish to make about student learning, regardless of attribution.

Framing this challenge as a technology problem treats a failure of assessment design as a technical deficit, directing institutional attention toward surveillance-based interventions that cannot resolve the underlying validity problem.

The appeal of detection, watermarking, and process-tracking technologies is therefore understandable. They appear to offer institutions a way to preserve existing assessment practices while also responding decisively to GenAI. By promising attribution, traceability, or post-hoc verification, the deployment of these tools suggests that the problem can be contained without reopening more difficult questions about assessment design, supervision, or the kinds of claims written tasks are meant to support. However, in reality, they function less as solutions than as stabilising devices, allowing institutions to defer disruptive change while signalling action.

This stabilisation comes at a cost. When the integrity of written assessment is treated primarily as a problem of identifying unauthorised assistance, responsibility is displaced from institutional assessment design onto individual students and staff. Students are positioned as potential violators to be monitored, while educators are asked to police compliance using tools whose outputs are probabilistic, opaque, and arbitrarily interpreted. The result is an escalation of surveillance and procedural burden without a corresponding increase in confidence about what student work actually evidences.

Seen in this light, current debates about AI in assessment are less about technology than about institutional appetite for change. The search for technology solutions reflects a preference for preserving existing assessment practices whose credibility rests on enforcement regimes that are both misdirected and ineffective, exposing institutions to regulatory risk by substituting surveillance for valid assessment.

Meaningful change would require institutions to concede that a core assessment practice has been epistemically fragile for a long time, and that GenAI has merely made this fragility visible.

Technology-based enforcement appears to offer a way to avoid that concession while also promising continuity. Assessment redesign requires institutions to face questions about what is being assessed, how learning is evidenced, how staff time is used, and what assurance actually means. That is harder, slower, and politically riskier than procuring new controls, regardless of their effectiveness.

The costs are already emerging, as institutions add more surveillance to assessment while becoming less confident that outcomes demonstrate learning. This develops into governance failure when enforcement substitutes for validity and institutions drift out of alignment with the requirement that assessment methods credibly demonstrate achievement of learning outcomes under the Higher Education Standards Framework (Standard 1.4.3). Over the longer term, successive technical fixes will entrench institutional inertia, raise the threshold for redesign, and weaken standards by delaying the only response capable of restoring assessment credibility: validity-led assessment design.

GenAI has not created a new assessment problem, but it has removed the conditions that once allowed institutions to avoid confronting an old one.

The persistence of enforcement-led responses reflects not a lack of alternatives, but a reluctance to acknowledge the limits of unsupervised written assessment as a basis for institutional claims about learning.

What has been exposed is a decision point rather than a technical gap, between continuing to manage appearances through performative control or accepting the implications of what written assessment can and cannot validly evidence.

A/Prof Mark Bassett is Academic Lead, AI at Charles Sturt University

Share:

Facebook
Twitter
Pinterest
LinkedIn

Sign Up for Our Newsletter

Subscribe to us to always stay in touch with us and get latest news, insights, jobs and events!!