Research

Aligning Language Models with Human Values

S. de Vries, D. Bakker · EACL 2025

We introduce a framework for steering large language models toward explicitly stated human values rather than implicit preferences inferred from data. Across three benchmarks the method reduces value-laden errors without degrading task performance, and we discuss the governance questions that arise when "which values" becomes a design choice.

Auditing Algorithmic Decision-Making in the Public Sector

J. Meijer, S. de Vries · AI & Society, 2025

We report on audits of automated decision systems used by Dutch public bodies. Drawing on case studies in benefits and fraud detection, we identify recurring failure modes and propose a practical audit protocol that agencies can apply before deployment.

Transparency and Trust in Generative AI

F. El Amrani, L. Hoffmann · ACM CHI 2024

How much should a generative system reveal about its own uncertainty? In a study with 240 participants we find that calibrated confidence cues improve appropriate trust, while verbose explanations often backfire, and we offer design guidance for communicating model limitations.

Search results

Aligning Language Models with Human Values

Auditing Algorithmic Decision-Making in the Public Sector

Transparency and Trust in Generative AI