Week 1: Introduction
POP77142 Quantitative Text Analysis for Social Scientists
Overview
- Module objectives
- Prerequisites and software
- Materials and books
- Module meetings
- Assessment and collaboration
- Weekly schedule
Module Objectives
- Introduce the fundamentals of working with text as data;
- Extract and prepare textual data for analysis;
- Apply key computational techniques for textual data;
- Practice these concepts using social science examples.
Books
Also:
- Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press
- Jacob Eisenstein. 2019. Introduction to Natural Language Processing. Cambridge, MA: The MIT Press.
- Klaus Krippendorff. 2019. Content Analysis: An Introduction to Its Methodology. 4th. Thousand Oaks, CA: SAGE Publications
Additional Online Materials
Prerequisites and Software
- Intermediate module - familiarity with basic statistical concepts and programming in R/Python is assumed.
- Laptop with Windows/Mac/Linux OS (no Chrome books)
- Required software:
- Jupyter - web-based interactive computational environment
- Python (version 3+) - versatile programming language
- R (version 4+) - statistical programming language
- Additional software:
Module Meetings
- 2-hour lecture
- 2-hour tutorials
- Office hours:
Plagiarism
- Plagiarising computer code is as serious as plagiarising text (see Google LLC v. Oracle America, Inc.)
- All submitted programming assignments and final project should be done individually;
- You may discuss general approaches to solutions with your peers;
- But do not share or view each others code;
- You can use online resources but give credit in the comments.
Generative AI
- The use of generative AI is permitted.
- However:
- No part of the module content can be used in a prompt;
- It needs to be explicitly acknowledged in the submission;
- You need to state the version of the model used.
- Hardware permitting, I recommend using local offline models installed on your machine.
- E.g. check LM Studio as a user-friendly interface to different models.
Module Outline
| 8 |
12 March |
Introduction |
Assignment 1 |
|
| 9 |
19 March |
Quantifying Texts |
|
|
| 10 |
26 March |
Classifying Texts |
Assignment 2 |
Assignment 1 |
| 11 |
2 April |
Modelling Texts |
|
|
| 12 |
9 April |
Beyond BOW |
|
Assignment 2 |