Quantifying Texts
Lecture Slides Lecture Slides (pdf) Lecture Slides (ipynb)
Tutorial Exercise Tutorial Exercise (pdf) Tutorial Exercise (ipynb)
This week we focus on representing text in a format suitable for quantitative analysis, also known as bag-of-words (BOW) representation. We will also look at the basic principles of text preprocessing and tokenization.
Required Readings
- Grimmer, Roberts & Stewart Chs 5 Bag of Words.
- Matthew J. Denny and Arthur Spirling. 2018. “Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do About It.” Political Analysis 26 (2): 168–189. https://arthurspirling.org/documents/preprocessing.pdf
Additional Readings
- Kasper Welbers, Wouter Van Atteveldt, and Kenneth Benoit. 2017. “Text Analysis in R.” Communication Methods and Measures 11 (4): 245–265. https://kenbenoit.net/pdfs/text_analysis_in_R.pdf
- Paul C. Bauer, Camille Landesvatter, and Lion Behrens, eds. 2024. APIs for social scientists: A collaborative review. https://paulcbauer.github.io/apis_for_social_scientists_a_review
Tutorial
- Using APIs
- Representing text as a bag-of-words