-
Aligning Language Models with Human Values
We introduce a framework for steering large language models toward explicitly stated human values rather than implicit preferences inferred from data. Across three benchmarks the method reduces value-laden errors without degrading task performance, and we discuss the governance questions that arise when "which values" becomes a design choice.
-
Auditing Algorithmic Decision-Making in the Public Sector
We report on audits of automated decision systems used by Dutch public bodies. Drawing on case studies in benefits and fraud detection, we identify recurring failure modes and propose a practical audit protocol that agencies can apply before deployment.
-
Transparency and Trust in Generative AI
How much should a generative system reveal about its own uncertainty? In a study with 240 participants we find that calibrated confidence cues improve appropriate trust, while verbose explanations often backfire, and we offer design guidance for communicating model limitations.