Dirk Hovy
Home
Research
Publications
Projects
Talks
CV
Fun
Blog
Leatherwork
Contact
Paul Röttger
Latest
Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Cite
×