Publications

Emotions play important epistemological and cognitive roles in our lives, revealing our values and guiding our actions. Previous work …

Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and …

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One …

As diverse linguistic communities and users adopt large language models (LLMs), assessing their safety across languages becomes …

Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk …

Social scientists increasingly use demographically stratified social media data to study the attitudes, beliefs, and behavior of the …

Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, …

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and …

Emotions are a central aspect of communication. Consequently, emotion analysis (EA) is a rapidly growing field in natural language …

Open conversations are one of the most engaging forms of teaching. However, creating those conversations in educational software is a …

Large language models (LLMs) reflect societal norms and biases, especially about gender. While societal biases and stereotypes have …

Since Labov’s (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts …

Since the foundational work of William Labov on the social stratification of language (Labov, 1964), linguistics has made concentrated …

Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and …

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. …

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. …

Work on hate speech has made considering rude and harmful examples in scientific publications inevitable. This situation raises various …

Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is …

Much work in natural language processing (NLP) relies on human annotation. The majority of this implicitly assumes that annotator’s …

Hate speech detection faces two significant challenges: 1) the limited availability of labeled data and 2) the high variability of hate …

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an …

Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data …

Natural Language Processing has seen impressive gains in recent years. This research includes the demonstration by NLP models to have …

Demographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating demographic factors can …

Large language models (LLMs) offer a range of new possibilities, including adapting the text to different audiences and their reading …

Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively …

Artificial Intelligence (AI) is at a crucial point in its development: stable enough to be used in production systems, and increasingly …

Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific …

Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though, annotation decisions are governed …

Pre-trained language models (PLMs) have outperformed other NLP models on a wide range of tasks. Opting for a more thorough …

Fairness and environmental impact are important research directions for the sustainable development of artificial intelligence. …

Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the …

Language is constantly changing and evolving, leaving language models to quickly become outdated, both factually and linguistically. …

The world of pronouns is changing – from a closed word class with few members to an open set of terms to reflect identities. However, …

Over the last several years, end-to-end neural conversational agents have vastly improved their ability to carry unrestricted, …

Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models. Producing …

Meaning is context-dependent, but many properties of language (should) remain the same even if we transform the context. For example, …

Detecting emotion in text allows social and computational scientists to study how people behave and react to online events. However, …

Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are …

The maturity level of language models is now at a stage in which many companies rely on them to solve various tasks. However, while …

Current language technology is ubiquitous and directly influences individuals’ lives worldwide. Given the recent trend in AI on …

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use …

The social impact of natural language processing and its applications has received increasing attention. In this position paper, we …

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, …

Text contains a wealth of information about about a wide variety of sociocultural constructs. Automated prediction methods can infer …

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such …

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the …

There are some issues with current research trends in NLP that can hamper the free development of scientific research. We identify five …

Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much …

In an election campaign, political parties pledge to implement various projects–should they be elected. But do they follow …

Natural language processing (NLP) applications are now more powerful and ubiquitous than ever before. With rapidly developing (neural) …

Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially …

The paper describes the MilaNLP team’s submission (Bocconi University, Milan) in the WASSA 2021 Shared Task on Empathy Detection and …

Sentiment analysis is a common task to understand people’s reactions online. Still, we often need more nuanced information: is …

Supervised learning assumes that a ground truth label exists. However, the reliability of this ground truth depends on human …

While emotions are universal aspects of human psychology, they are expressed differently across different languages and cultures. We …

Spotting a lie is challenging but has an enormous potential impact on security as well as private and public safety. Several NLP …

We introduce a novel topic modeling method that can make use of contextulized embeddings (e.g., BERT) to do zero-shot cross-lingual …

Text is everywhere, and it is a fantastic resource for social scientists. However, because it is so abundant, and because language is …

An increasing number of natural language processing papers address the effect of bias on predictions, introducing mitigation techniques …

The main goal of machine translation has been to convey the correct content. Stylistic considerations have been at best secondary. We …

Geotagged Twitter data allows us to investigate correlations of geographic language variation, both at an interlingual and intralingual …

Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained …

When interacting with each other, we motivate, advise, inform, show love or power towards our peers. However, the way we interact may …

Recently, Peterson et al. provided evidence of the benefits of using probabilistic soft labels generated from crowd annotations for …

Geolocating social media posts relies on the assumption that language carries sufficient geographic information. However, locations are …

User reviews provide a significant source of information for companies to understand their market and audience. In order to discover …

Geolocation, predicting the location of a post based on text and other information, has a huge potential for several social media …

Prior research has shown that geolocation can be substantially improved by including user network information. While effective, it …

Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a …

Can large-scale peer interaction foster entrepreneurship and innovation? We conducted an RCT involving almost 5,000 entrepreneurs from …

Newspapers need to attract readers with headlines, anticipating their readers’ preferences. These preferences rely on topical, …

The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) …

Dialects are one of the main drivers of language variation, a major challenge for natural language processing tools. In most languages, …

Over the years, natural language processing has increasingly focused on tasks that can be solved by statistical models, but ignored the …