Beyond Bag-of-Words
Lecture Slides Lecture Slides (pdf) Lecture Slides (ipynb)
Tutorial Exercise Tutorial Exercise (pdf) Tutorial Exercise (ipynb)
In this week we move beyond the bag-of-words representation of text data. We look at word embeddings, a technique that represents words as vectors in a high-dimensional space. We discuss the advantages of word embeddings over traditional methods and the different ways to create and use them in practice.
Required Readings
- Chs 7–8 Grimmer, Roberts, and Stewart 2022;
- Pedro L. Rodriguez and Arthur Spirling. 2022. “Word Embeddings: What works, what doesn’t, and how to tell the difference for applied research.” Journal of Politics 84 (1): 101–115. http://arthurspirling.org/documents/embed.pdf
Additional Readings
- Ch 6 Vector Semantics and Embeddings Jurafsky and Martin 2025;
- Rodman, Emma. 2020. “A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors.” Political Analysis 28 (1): 87–111.
- Ludovic Rheault and Christopher Cochrane. 2020. “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora.” Political Analysis 28 (1): 112–133 https://lrheault.github.io/downloads/rheaultcochrane2019_pa.pdf
- Tomas Mikolov et al. 2013. “Distributed representations of words and phrases and their compositionality.” In Proceedings of the 27th International Conference on Neural Information Processing Systems, 2:3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation.” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543. https://nlp.stanford.edu/pubs/glove.pdf
Videos
Tutorial
- Working with word embeddings