POP77032 Quantitative Text Analysis for Social Scientists
# As the full impeachment debate is split across two sections,
# we will first download both HTML files and then combine them in single text
impeachment1 <- rvest::read_html(
"https://www.govinfo.gov/content/pkg/CREC-2021-01-13/html/CREC-2021-01-13-pt1-PgH151-8.htm"
)
impeachment2 <- rvest::read_html(
"https://www.govinfo.gov/content/pkg/CREC-2021-01-13/html/CREC-2021-01-13-pt1-PgH165.htm"
)\[ \textbf{cut} \]
and
\[ \textbf{cats} \] - How different are they?
flowchart LR
A["cut"]
B["cat"]
C["cats"]
A -- "substitute u → a (1)" --> B
B -- "insert s (1)" --> C
stringdist package to calculate various string distance metrics.textdistance library to calculate string distances.