Search

Paul Röttger

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions
Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks