Week 1: Introduction
POP77032 Quantitative Text Analysis for Social Scientists
Overview
- Module objectives
- Prerequisites and software
- Materials and books
- Module meetings
- Assessment and collaboration
- Weekly schedule
Module Objectives
- Introduce the fundamentals of working with text as data;
- Extract and prepare textual data for analysis;
- Apply key computational techniques for textual data;
- Practice these concepts using social science examples.
Books
Also:
- Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press
- Jacob Eisenstein. 2019. Introduction to Natural Language Processing. Cambridge, MA: The MIT Press.
- Klaus Krippendorff. 2019. Content Analysis: An Introduction to Its Methodology. 4th ed. Thousand Oaks, CA: SAGE Publications
Additional Online Materials
Prerequisites and Software
- Intermediate module - familiarity with basic statistical concepts and programming in R/Python is assumed.
- Laptop with Windows/Mac/Linux OS (no Chrome books)
- Required software:
- Jupyter - web-based interactive computational environment
- Python (version 3+) - versatile programming language
- R (version 4+) - statistical programming language
- Additional software:
Module Meetings
- 2-hour lecture
- Until Reading Week - Wednesday 16:00-18:00 in 4050A Arts Building
- After Reading Week - Wednesday 16:00-18:00 in 5052 Arts Building
- 2-hour tutorials
- Office hours:
Plagiarism Policy
- Plagiarising computer code is as serious as plagiarising text (see Google LLC v. Oracle America, Inc.)
- All submitted programming assignments and final project should be done individually;
- You may discuss general approaches to solutions with your peers;
- But do not share or view each others code;
- You can use online resources but give credit in the comments.
Generative AI Policy
- The use of generative AI is permitted.
- However:
- No part of the module content can be used in a prompt;
- It needs to be explicitly acknowledged in the submission;
- You need to state the version of the model used.
- Hardware permitting, I recommend using local offline models installed on your machine.
- E.g. check LM Studio as a user-friendly interface to different models.
Module Outline
| 1 |
21 January |
Introduction |
|
|
| 2 |
28 January |
Words and Tokens |
Assignment 1 |
|
| 3 |
4 February |
Quantifying Texts |
|
|
| 4 |
11 February |
Dictionaries and Sentiment |
|
Assignment 1 |
| 5 |
18 February |
Supervised Modelling |
Assignment 2 |
|
| 6 |
25 February |
Unsupervised Modelling |
|
|
| 7 |
4 March |
- |
|
Assignment 2 |
| 8 |
11 March |
Beyond Bag-of-Words |
|
|
| 9 |
18 March |
Embeddings |
Assignment 3 |
|
| 10 |
25 March |
Neural Networks |
|
|
| 11 |
1 April |
Transformers |
|
Assignment 3 |
| 12 |
8 April |
Large Language Models |
|
|