Quantifying Texts

Lecture Slides Lecture Slides (pdf) Lecture Slides (ipynb)

Tutorial Exercise Tutorial Exercise (pdf) Tutorial Exercise (ipynb)


This week we focus on representing text in a format suitable for quantitative analysis, also known as bag-of-words (BOW) representation. We will also look at the basic principles of text preprocessing and tokenization.

Required Readings

  • Grimmer, Roberts & Stewart Chs 5 Bag of Words.
  • Matthew J. Denny and Arthur Spirling. 2018. “Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do About It.” Political Analysis 26 (2): 168–189. https://arthurspirling.org/documents/preprocessing.pdf

Additional Readings

Tutorial

  • Using APIs
  • Representing text as a bag-of-words