NLP

Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement

We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) tasks. While inter-annotator agreement is frequently used as an indication of label reliability by measuring …

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models

Large language models (LLMs) are increasingly integrated into our daily lives and personalized. However, LLM personalization might also increase unintended side effects. Recent work suggests that persona prompting can lead models to falsely refuse …

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective …

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to label data, but recent work has shown that models …

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

The rapid development of Large Language Models (LLMs) opens up the possibility of using them aspersonal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various …

Socially Aware Language Technologies: Perspectives and Practices

Language technologies have advanced substantially, particularly with the introduction of large language models. However, these advancements can exacerbate several issues that models have traditionally faced, including bias, evaluation, and risk. In …

Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

As diverse linguistic communities and users adopt large language models (LLMs), assessing their safety across languages becomes critical. Despite ongoing efforts to make LLMs safe, they can still be made to behave unsafely with jailbreaking, a …

My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions to limit the response space. The model is then evaluated by …

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased …

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale feedback learning, which aim to make models both helpful …