Beyond Bag-of-Words

Lecture Slides Lecture Slides (pdf) Lecture Slides (ipynb)

Tutorial Exercise Tutorial Exercise (pdf) Tutorial Exercise (ipynb)


In this week we move beyond the bag-of-words representation of text data. We look at word embeddings, a technique that represents words as vectors in a high-dimensional space. We discuss the advantages of word embeddings over traditional methods and the different ways to create and use them in practice.

Required Readings

  • Chs 7–8 Grimmer, Roberts, and Stewart 2022;
  • Pedro L. Rodriguez and Arthur Spirling. 2022. “Word Embeddings: What works, what doesn’t, and how to tell the difference for applied research.” Journal of Politics 84 (1): 101–115. http://arthurspirling.org/documents/embed.pdf

Additional Readings

Videos

Tutorial

  • Working with word embeddings