
Research on what works in AI assessment design is “fragmented and difficult to navigate,” Thomas Corbin (Deakin U) and a compendium of colleagues have warned.
They propose priorities for research, because they are needed, and now.
“Teachers … need to make decisions. They have students in class today and cannot put off decision making until a (likely non-existent) perfect solution is found,” they argue in what is less of a paper; more of a work plan.
They propose basing research on four principles:
- Clarifying what assessment involving AI means: less “mechanics of verification” more focus on the “broader educational purposes that assessment serves”
- Identifying different AI tasks: is the research on students using AI to, to check grammar, generate essay outlines, or “to simulate dialogue with historical figures”?
- Putting the tasks being researched in historical context: “we see studies rediscovering insights about collaboration, delegation, or textual borrowing that have been well understood in other contexts”
- And an ethical one: inclusion, equity, ethics and social justice are “necessary guiding principles.”
And they establish domains of inquiry into what matters, including:
- Why assess: “without research into why we assess, our responses to the AI challenge risk collapsing into piecemeal technical fixes that preserve measurement routines while eroding the moral and social grounds on which those routines are legitimated.”
- Who created and who assesses work made with AI: “understanding who is involved in assessment, and on what grounds, will be crucial for ensuring that assessment remains a credible, recognitive, and publicly defensible practice of judgment.”
- what can and should be assessed: “every assessment embodies an assumption of purpose and a theory of what matters” AI “exposes how those theories are encoded in learning outcomes.”
- how and when to assess: AI complicates the translation of what learners do into what teachers see and how/where it is graded
- where assessment happens: “students may complete work in digital and AI-rich environments, teachers may design or grade within external tools, and universities may increasingly rely on platform analytics or credentialing systems to confirm achievement.”
The take-out: “the future of assessment will be defined by trade-offs rather than simple solutions. Research that takes up these domains can help to clarify what those trade-offs are, and what they mean in different contexts.”
This builds on Corbin and colleagues already influential September paper on pragmatic assessment of student work in the age of AI, when there are better/worse, but not correct ways, (FC here).