Beyond Bag-of-Words

Lecture Slides Lecture Slides (pdf) Lecture Slides (ipynb)

Tutorial Exercise Tutorial Exercise (pdf) Tutorial Exercise (ipynb)


In this week we move beyond the bag-of-words representation of text data. We look at word embeddings, a technique that represents words as vectors in a high-dimensional space. We discuss the advantages of word embeddings over traditional methods and the different ways to create and use them in practice.

Required Readings

  • Chs 7–8 Grimmer, Roberts, and Stewart 2022;
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation.” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543. https://nlp.stanford.edu/pubs/glove.pdf

Additional Readings

Videos

Tutorial

  • Working with word embeddings