embeddings

Visualizing Regional Language Variation Across Europe on Twitter

Geotagged Twitter data allows us to investigate correlations of geographic language variation, both at an interlingual and intralingual level. Based on data-driven studies of such relationships, this paper investigates regional variation of language …

Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting

Dialects are one of the main drivers of language variation, a major challenge for natural language processing tools. In most languages, dialects exist along a continuum, and are commonly discretized by combining the extent of several preselected …