Beyond Bag-of-Words
Lecture Slides Lecture Slides (pdf) Lecture Slides (ipynb)
Tutorial Exercise Tutorial Exercise (pdf) Tutorial Exercise (ipynb)
In this week we move beyond the bag-of-words representation of text data. We look at word embeddings, a technique that represents words as vectors in a high-dimensional space. We discuss the advantages of word embeddings over traditional methods and the different ways to create and use them in practice.
Required Readings
- Chs 7–8 Grimmer, Roberts, and Stewart 2022;
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation.” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543. https://nlp.stanford.edu/pubs/glove.pdf
Additional Readings
- Jurafsky and Martin 2026 Ch 5 Embeddings
- Tomas Mikolov et al. 2013. “Distributed representations of words and phrases and their compositionality.” In Proceedings of the 27th International Conference on Neural Information Processing Systems, 2:3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Videos
Tutorial
- Working with word embeddings