NLP | Dirk Hovy

Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation

Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and FSL) scenarios. However, since they are trained on different datasets, performance varies widely across tasks between …

MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard …

Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech

Hate speech detection faces two significant challenges: 1) the limited availability of labeled data and 2) the high variability of hate speech across different contexts and languages. Prompting brings a ray of hope to these challenges. It allows …

Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling

Much work in natural language processing (NLP) relies on human annotation. The majority of this implicitly assumes that annotator’s labels are temporally stable, although the reality is that human judgements are rarely consistent over time. As a …

The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics

Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is known to depend, at least in part, on the sociodemographics of annotators. Recent research aims to model individual …

The State of Profanity Obfuscation in Natural Language Processing Scientific Publications

Work on hate speech has made considering rude and harmful examples in scientific publications inevitable. This situation raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the …

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun …

What about ''em''? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

Leveraging Social Interactions to Detect Misinformation on Social Media

Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data set created during the COVID-19 pandemic. It contains cascades of tweets discussing information weakly labeled as …

Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

Natural Language Processing has seen impressive gains in recent years. This research includes the demonstration by NLP models to have turned into useful technologies with improved capabilities, measured in terms of how well they match human behavior …