Search

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and …

Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schuetze, Dirk Hovy

My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One …

Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank

Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

As diverse linguistic communities and users adopt large language models (LLMs), assessing their safety across languages becomes …

Fabio Pernisi, Dirk Hovy, Paul Röttger

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk …

Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods

Social scientists increasingly use demographically stratified social media data to study the attitudes, beliefs, and behavior of the …

Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz

Donya Rooein, Paul Rottger, Anastassia Shaitarova, Dirk Hovy

Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts

Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, …

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and …

Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

Flor Miriam Plaza-del-Arco, Alba Curry, Amanda Cercas Curry, Dirk Hovy

Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions

Emotions are a central aspect of communication. Consequently, emotion analysis (EA) is a rapidly growing field in natural language …

Conversations as a Source for Teaching Scientific Concepts at Different Education Levels

Open conversations are one of the most engaging forms of teaching. However, creating those conversations in educational software is a …

Donya Rooein, Dirk Hovy

Flor Miriam Plaza-del-Arco, Amanda Cercas Curry, Alba Curry, Gavin Abercrombie, Dirk Hovy

Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution

Large language models (LLMs) reflect societal norms and biases, especially about gender. While societal biases and stereotypes have …

Amanda Cercas Curry, Zeerak Talat, Dirk Hovy

Impoverished Language Technology: The Lack of (Social) Class in NLP

Since Labov’s (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts …

Amanda Cercas Curry, Giuseppe Attanasio, Zeerak Talat, Dirk Hovy

Classist Tools: Social Class Correlates with Performance in NLP

Since the foundational work of William Labov on the social stratification of language (Labov, 1964), linguistics has made concentrated …

Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy

Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation

Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and …

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. …

Anne Lauscher, Debora Nozza, Ehm Miltersen, Archie Crowley, Dirk Hovy

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. …

Anne Lauscher, Debora Nozza, Ehm Miltersen, Archie Crowley, Dirk Hovy

The State of Profanity Obfuscation in Natural Language Processing Scientific Publications

Work on hate speech has made considering rude and harmful examples in scientific publications inevitable. This situation raises various …

Debora Nozza, Dirk Hovy

The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics

Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is …

Matthias Orlikowski, Paul Röttger, Philipp Cimiano, Dirk Hovy

Gavin Abercrombie, Dirk Hovy, Vinodkumar Prabhakaran

Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling

Much work in natural language processing (NLP) relies on human annotation. The majority of this implicitly assumes that annotator’s …

Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy

Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech

Hate speech detection faces two significant challenges: 1) the limited availability of labeled data and 2) the high variability of hate …

Amanda Cercas Curry, Giuseppe Attanasio, Debora Nozza, Dirk Hovy

MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an …

Leveraging Social Interactions to Detect Misinformation on Social Media

Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data …

Tommaso Fornaciari, Luca Luceri, Emilio Ferrara, Dirk Hovy

Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

Natural Language Processing has seen impressive gains in recent years. This research includes the demonstration by NLP models to have …

Sunipa Dev, Vinodkumar Prabhakaran, David Adelani, Dirk Hovy, Luciana Benotti

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

Demographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating demographic factors can …

Chia-chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

Donya Rooein, Amanda Cercas Curry, Dirk Hovy

Know Your Audience: Do LLMs Adapt to Different Age and Education Levels?

Large language models (LLMs) offer a range of new possibilities, including adapting the text to different audiences and their reading …

Beyond Digital 'Echo Chambers': The Role of Viewpoint Diversity in Political Discussion

Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively …

Rishav Hada, Amir Ebrahimi Fard, Sarah Shugars, Federico Bianchi, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintareva

Federico Bianchi, Amanda Cercas Curry, Dirk Hovy

Viewpoint: Artificial Intelligence Accidents Waiting to Happen?

Artificial Intelligence (AI) is at a crucial point in its development: stable enough to be used in production systems, and increasingly …

Federico Bianchi, Vincenzo Cutrona, Dirk Hovy

Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data

Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific …

It's Not Just Hate: A Multi-Dimensional Perspective on Detecting Harmful Speech Online

Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though, annotation decisions are governed …

Federico Bianchi, Stefanie Hills, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintarev

Anne Lauscher, Federico Bianchi, Samuel R. Bowman, Dirk Hovy

SocioProbe: What, When, and Where Language Models Learn about Sociodemographics

Pre-trained language models (PLMs) have outperformed other NLP models on a wide range of tasks. Opting for a more thorough …

Marius Hessenthaler, Emma Strubell, Dirk Hovy, Anne Lauscher

Bridging Fairness and Environmental Sustainability in Natural Language Processing

Fairness and environmental impact are important research directions for the sustainable development of artificial intelligence. …

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages

Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the …

Paul Röttger, Debora Nozza, Federico Bianchi, Dirk Hovy

Giuseppe Attanasio, Debora Nozza, Federico Bianchi, Dirk Hovy

Is It Worth the (Environmental) Cost? Limited Evidence for the Benefits of Diachronic Continuous Training

Language is constantly changing and evolving, leaving language models to quickly become outdated, both factually and linguistically. …

Anne Lauscher, Archie Crowley, Dirk Hovy

Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender

The world of pronouns is changing – from a closed word class with few members to an open set of terms to reflect identities. However, …

Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design

Over the last several years, end-to-end neural conversational agents have vastly improved their ability to carry unrestricted, …

A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau, Verena Rieser

Tommaso Fornaciari, Alexandra Uma, Massimo Poesio, Dirk Hovy

Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa

Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models. Producing …

Federico Bianchi, Debora Nozza, Dirk Hovy

Language Invariant Properties in Natural Language Processing

Meaning is context-dependent, but many properties of language (should) remain the same even if we transform the context. For example, …

Federico Bianchi, Debora Nozza, Dirk Hovy

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Detecting emotion in text allows social and computational scientists to study how people behave and react to online events. However, …

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are …

Paul Röttger, Bertie Vidgen, Dirk Hovy, Janet B. Pierrehumbert

Debora Nozza, Federico Bianchi, Dirk Hovy

Pipelines for Social Bias Testing of Large Language Models

The maturity level of language models is now at a stage in which many companies rely on them to solve various tasks. However, while …

PDF Project Poster

Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals

Current language technology is ubiquitous and directly influences individuals’ lives worldwide. Given the recent trend in AI on …

Debora Nozza, Federico Bianchi, Anne Lauscher, Dirk Hovy

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use …

Giuseppe Attanasio, Debora Nozza, Eliana Pastor, Dirk Hovy

SAFETYKIT: First Aid for Measuring Safety in Open-domain Conversational Systems

The social impact of natural language processing and its applications has received increasing attention. In this position paper, we …

Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, …

Giuseppe Attanasio, Debora Nozza, Dirk Hovy, Elena Baralis

Text Analysis in Python for Social Scientists – Prediction and Classification

Text contains a wealth of information about about a wide variety of sociocultural constructs. Automated prediction methods can infer …

Dirk Hovy

Learning from Disagreement: A Survey

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such …

Alexandra N Uma, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio

Federico Bianchi, Silvia Terragni, Dirk Hovy

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the …

Federico Bianchi, Dirk Hovy

On the Gap between Adoption and Understanding in NLP

There are some issues with current research trends in NLP that can hamper the free development of scientific research. We identify five …

Dirk Hovy, Shrimai Prabhumoye

Five sources of bias in natural language processing

Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much …

'We will Reduce Taxes' - Identifying Election Pledges with Language Models

In an election campaign, political parties pledge to implement various projects–should they be elected. But do they follow …

Tommaso Fornaciari, Dirk Hovy, Elin Naurin, Julia Runeson, Robert Thomson, Pankaj Adhikari

The Importance of Modeling Social Factors of Language: Theory and Practice

Natural language processing (NLP) applications are now more powerful and ubiquitous than ever before. With rapidly developing (neural) …

Dirk Hovy, Diyi Yang

Debora Nozza, Federico Bianchi, Dirk Hovy

HONEST: Measuring Hurtful Sentence Completion in Language Models

Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially …

PDF Code Project Poster Slides Blog Post

MilaNLP @ WASSA: Does BERT Feel Sad When You Cry?

The paper describes the MilaNLP team’s submission (Bocconi University, Milan) in the WASSA 2021 Shared Task on Empathy Detection and …

Tommaso Fornaciari, Federico Bianchi, Debora Nozza, Dirk Hovy

Federico Bianchi, Debora Nozza, Dirk Hovy

FEEL-IT: Emotion and Sentiment Classification for the Italian Language

Sentiment analysis is a common task to understand people’s reactions online. Still, we often need more nuanced information: is …

Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy and Massimo Poesio

Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning

Supervised learning assumes that a ground truth label exists. However, the reliability of this ground truth depends on human …

Sotiris Lamprinidis, Federico Bianchi, Daniel Hardt, Dirk Hovy

Universal Joy A Data Set and Results for Classifying Emotions Across Languages

While emotions are universal aspects of human psychology, they are expressed differently across different languages and cultures. We …

Tommaso Fornaciari, Federico Bianchi, Dirk Hovy, Massimo Poesio

BERTective: Language Models and Contextual Information for Deception Detection

Spotting a lie is challenging but has an enormous potential impact on security as well as private and public safety. Several NLP …

PDF Code Dataset

Cross-lingual Contextualized Topic Models with Zero-shot Learning

We introduce a novel topic modeling method that can make use of contextulized embeddings (e.g., BERT) to do zero-shot cross-lingual …

Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini

PDF Code Slides Blog Post

Text Analysis in Python for Social Scientists – Discovery and Exploration

Text is everywhere, and it is a fantastic resource for social scientists. However, because it is so abundant, and because language is …

Dirk Hovy

Deven Santosh Shah, H. Andrew Schwartz, Dirk Hovy

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

An increasing number of natural language processing papers address the effect of bias on predictions, introducing mitigation techniques …

PDF Video

“You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases

The main goal of machine translation has been to convey the correct content. Stylistic considerations have been at best secondary. We …

Dirk Hovy, Federico Bianchi, Tommaso Fornaciari

PDF Video

Visualizing Regional Language Variation Across Europe on Twitter

Geotagged Twitter data allows us to investigate correlations of geographic language variation, both at an interlingual and intralingual …

Dirk Hovy, Afshin Rahimi, Timothy Baldwin, Julian Brooke

DOI

What the [MASK]? Making Sense of Language-Specific BERT Models

Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained …

Debora Nozza, Federico Bianchi, Dirk Hovy

PDF Code Project Source Document

Helpful or Hierarchical? Predicting the Communicative Strategies of Chat Participants, and their Impact on Success

When interacting with each other, we motivate, advise, inform, show love or power towards our peers. However, the way we interact may …

Farzana Rashid, Tommaso Fornaciari, Dirk Hovy, Eduardo Blanco, Fernando Vega-Redondo

A Case for Soft Loss Functions

Recently, Peterson et al. provided evidence of the benefits of using probabilistic soft labels generated from crowd annotations for …

Alexandra Uma, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio

Tommaso Fornaciari, Dirk Hovy

Identifying Linguistic Areas for Geolocation

Geolocating social media posts relies on the assumption that language carries sufficient geographic information. However, locations are …

Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers

User reviews provide a significant source of information for companies to understand their market and audience. In order to discover …

Hanh Nguyen, Dirk Hovy

Tommaso Fornaciari, Dirk Hovy

Geolocation with Attention-Based Multitask Learning Models

Geolocation, predicting the location of a post based on text and other information, has a huge potential for several social media …

Tommaso Fornaciari, Dirk Hovy

Dense Node Representation for Geolocation

Prior research has shown that geolocation can be substantially improved by including user network information. While effective, it …

Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing

Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a …

Aparna Garimella, Carmen Banea, Dirk Hovy, Rada Mihalcea

Peer networks and entrepreneurship: A Pan-African RCT

Can large-scale peer interaction foster entrepreneurship and innovation? We conducted an RCT involving almost 5,000 entrepreneurs from …

Fernando Vega-Redondo, Paolo Pin, Diego Ubfal, Cristiana Benedetti-Fasil, Charles Brummitt, Gaia Rubera, Dirk Hovy, Tommaso Fornaciari

Dirk Hovy, Tommaso Fornaciari

Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

Sotiris Lamprinidis, Daniel Hardt, Dirk Hovy

Predicting News Headline Popularity with Syntactic and Semantic Knowledge Using Multi-Task Learning

Newspapers need to attract readers with headlines, anticipating their readers’ preferences. These preferences rely on topical, …

Comparing Bayesian Models of Annotation

The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) …

Silviu Paun, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, Massimo Poesio

Dirk Hovy, Christoph Purschke

Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting

Dialects are one of the main drivers of language variation, a major challenge for natural language processing tools. In most languages, …

The Social and the Neural Network: How to Make Natural Language Processing about People again

Over the years, natural language processing has increasingly focused on tasks that can be solved by statistical models, but ignored the …

Dirk Hovy