Language Models

DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods

Social scientists increasingly use demographically stratified social media data to study the attitudes, beliefs, and behavior of the general public. To facilitate such analyses, we construct, validate, and release the representative DADIT dataset of …

MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard …

Language Models

DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods

MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

SocioProbe: What, When, and Where Language Models Learn about Sociodemographics

Cross-lingual Contextualized Topic Models with Zero-shot Learning