Beyond Bag-of-Words
Lecture Slides Lecture Slides (pdf) Lecture Slides (ipynb)
Tutorial Exercise Tutorial Exercise (pdf) Tutorial Exercise (ipynb)
In this week we move beyond the classic bag-of-words representation of text data and look at how to take account of word order and context.
Required Readings
- Grimmer, Roberts, and Stewart 2022 Chs 7 The Vector Space Model and Similarity Metrics, 8 Distributed Representations of Words.
- Turney, Peter D., and Patrick Pantel. 2010. “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research 37 (1): 141–88. https://doi.org/10.1613/jair.2934
Additional Readings
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation.” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543. https://nlp.stanford.edu/pubs/glove.pdf
- Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed representations of words and phrases and their compositionality.” In Proceedings of the 27th International Conference on Neural Information Processing Systems, 2:3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Tutorial
- Working with word embeddings