Unsupervised Modelling
This week we focus on probabilistic aspects of modelling text and take a look at some commonly assumed data generating processes and analytical approaches that result from those.
Required Readings
- Grimmer, Roberts, and Stewart 2022 Chs 6 The Multinomial Language Model, 12 Clustering, 13 Topic Models;
- Blei, David M. 2012. “Probabilistic Topic Models.” In Communications of the ACM, 55:77–84. https://www.cs.columbia.edu/~blei/papers/Blei2012.pdf.
Additional Readings
- Kenneth Benoit, Michael Laver, and Slava Mikhaylov. 2009. “Treating Words as Data with Error: Uncertainty in Text Statements of Policy Positions.” American Journal of Political Science 53 (2): 495–513. https://kenbenoit.net/pdfs/blm2009ajps.pdf
- Margaret E Roberts et al. 2014. “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58 (4): 1064–1082 https://scholar.harvard.edu/sites/scholar.harvard.edu/files/dtingley/files/topicmodelsopenendedexperiments.pdf
Tutorial
- Authorship prediction and topic modelling