Search

Paul Röttger

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance
Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions
Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks