Tokens consisting of 1 document.
text1 :
[1] "The" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog"
[10] "."
POP77142 Quantitative Text Analysis for Social Scientists
Tokens consisting of 1 document.
text1 :
[1] "The" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog"
[10] "."
| Document | brown | cat | dog | fox | jumps | lazy | over | quick | the |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 2 |
| 2 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 2 |
| Document | deputy | i | please | resume | seat | thank | the | your |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
library("Matrix")
sm <- Matrix::sparseMatrix(
i = c(1, 1, 1, 1, 2, 2, 2, 2, 2),
j = c(1, 2, 6, 7, 1, 3, 4, 5, 8),
x = c(1, 1, 1, 1, 1, 1, 1, 1, 1),
dims = c(2, 8),
dimnames = list(
c("Doc1", "Doc2"),
c("deputy", "i", "please", "resume", "seat", "thank", "the", "your"))
)
sm2 x 8 sparse Matrix of class "dgCMatrix"
deputy i please resume seat thank the your
Doc1 1 1 . . . 1 1 .
Doc2 1 . 1 1 1 . . 1
quanteda package to create DTMs.General Inquirer (Stone et al. 1966): an early all-purpose dictionary (e.g. sentiment analysis) in general texts.
Regressive Imagery Dictionary: designed to measure primordial vs. conceptual thinking.
Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al. 2001): large (paid) dictionary for many psychological and related concepts.
Formal class 'dictionary2' [package "quanteda"] with 2 slots
..@ .Data:List of 4
.. ..$ :List of 1
.. .. ..$ : chr [1:2858] "a lie" "abandon*" "abas*" "abattoir*" ...
.. ..$ :List of 1
.. .. ..$ : chr [1:1709] "ability*" "abound*" "absolv*" "absorbent*" ...
.. ..$ :List of 1
.. .. ..$ : chr [1:1721] "best not" "better not" "no damag*" "no no" ...
.. ..$ :List of 1
.. .. ..$ : chr [1:2860] "not a lie" "not abandon*" "not abas*" "not abattoir*" ...
..@ meta :List of 3
.. ..$ system:List of 5
.. .. ..$ package-version:Classes 'package_version', 'numeric_version' hidden list of 1
.. .. .. ..$ : int [1:3] 1 9 9009
.. .. ..$ r-version :Classes 'R_system_version', 'package_version', 'numeric_version' hidden list of 1
.. .. .. ..$ : int [1:3] 3 6 2
.. .. ..$ system : Named chr [1:3] "Darwin" "x86_64" "kbenoit"
.. .. .. ..- attr(*, "names")= chr [1:3] "sysname" "machine" "user"
.. .. ..$ directory : chr "/Users/kbenoit/Dropbox (Personal)/GitHub/quanteda/quanteda"
.. .. ..$ created : Date[1:1], format: "2020-02-17"
.. ..$ object:List of 2
.. .. ..$ valuetype: chr "glob"
.. .. ..$ separator: chr " "
.. ..$ user :List of 6
.. .. ..$ title : chr "Lexicoder Sentiment Dictionary (2015)"
.. .. ..$ description: chr "The 2015 Lexicoder Sentiment Dictionary in quanteda dictionary format. \n\nThe dictionary consists of 2,858 \""| __truncated__
.. .. ..$ source : chr "Young, L. & Soroka, S. (2012). Affective News: The Automated Coding of Sentiment in Political Texts. Political "| __truncated__
.. .. ..$ url : chr "https://www.snsoroka.com/data-lexicoder/"
.. .. ..$ license : chr "The LSD is available for non-commercial academic purposes only. By using data_dictionary_LSD2015, you accept th"| __truncated__
.. .. ..$ keywords : chr [1:4] "political" "news" "sentiment" "media"
..$ names: chr [1:4] "negative" "positive" "neg_positive" "neg_negative"
Extra
quanteda.dictionaries package contains a lot of mentioned dictionaries, including the Laver and Garry (2000) dictionary.Formal class 'dictionary2' [package "quanteda"] with 2 slots
..@ concatenator: chr " "
..@ names : chr [1:9] "CULTURE" "ECONOMY" "ENVIRONMENT" "GROUPS" ...
..@ .Data :List of 9
.. ..$ :List of 4
.. .. ..$ CULTURE-HIGH :List of 1
.. .. .. ..$ : chr [1:8] "art" "artistic" "dance" "galler*" ...
.. .. ..$ CULTURE-POPULAR:List of 1
.. .. .. ..$ : chr "media"
.. .. ..$ SPORT :List of 1
.. .. .. ..$ : chr "angler*"
.. .. ..$ : chr [1:3] "people" "war_in_iraq" "civil_war"
.. ..$ :List of 3
.. .. ..$ +STATE+:List of 1
.. .. .. ..$ : chr [1:50] "accommodation" "age" "ambulance" "assist" ...
.. .. ..$ =STATE=:List of 1
.. .. .. ..$ : chr [1:71] "accountant" "accounting" "accounts" "advert*" ...
.. .. ..$ -STATE-:List of 1
.. .. .. ..$ : chr [1:62] "assets" "autonomy" "barrier*" "bid" ...
.. ..$ :List of 2
.. .. ..$ CON ENVIRONMENT:List of 1
.. .. .. ..$ : chr "produc*"
.. .. ..$ PRO ENVIRONMENT:List of 1
.. .. .. ..$ : chr [1:28] "car" "catalytic" "chemical*" "chimney*" ...
.. ..$ :List of 2
.. .. ..$ ETHNIC:List of 1
.. .. .. ..$ : chr [1:5] "asian*" "buddhist*" "ethnic*" "race" ...
.. .. ..$ WOMEN :List of 1
.. .. .. ..$ : chr [1:3] "girls" "woman" "women"
.. ..$ :List of 3
.. .. ..$ CONSERVATIVE:List of 1
.. .. .. ..$ : chr [1:11] "authority" "continu*" "disrupt*" "inspect*" ...
.. .. ..$ NEUTRAL :List of 1
.. .. .. ..$ : chr [1:38] "administr*" "advis*" "agenc*" "amalgamat*" ...
.. .. ..$ RADICAL :List of 1
.. .. .. ..$ : chr [1:23] "abolition" "accountable" "answerable" "consult*" ...
.. ..$ :List of 2
.. .. ..$ LAW-CONSERVATIVE:List of 1
.. .. .. ..$ : chr [1:52] "assaults" "bail" "burglar*" "constab*" ...
.. .. ..$ LAW-LIBERAL :List of 1
.. .. .. ..$ : chr [1:2] "harassment" "non-custodial"
.. ..$ :List of 1
.. .. ..$ : chr [1:16] "agricultur*" "badgers" "bird*" "countryside" ...
.. ..$ :List of 1
.. .. ..$ : chr "town*"
.. ..$ :List of 2
.. .. ..$ CONSERVATIVE:List of 1
.. .. .. ..$ : chr [1:32] "defend" "defended" "defending" "discipline" ...
.. .. ..$ LIBERAL :List of 1
.. .. .. ..$ : chr [1:10] "cruel*" "discriminat*" "human*" "injustice*" ...
..$ concatenator: chr " "
..$ names : chr [1:9] "CULTURE" "ECONOMY" "ENVIRONMENT" "GROUPS" ...
\[ \text{immigration_focus}_i = \frac{M_i}{N_i} \]
\[ \text{immigration_position}_i = \frac{M^{anti}_i - M^{pro}_i}{N_i} \]
\[ \text{immigration_position}_i = \frac{M^{anti}_i - M^{pro}_i}{M^{anti}_i + M^{pro}_i} \]
\[ \text{immigration_position}_i = \log{\frac{M^{anti}_i}{M^{pro}_i}} \]
immigration_focus <- cbind(
manifestos,
quanteda::convert(manifestos_imm, to = "data.frame")
) |>
(\(df) transform(df, ntokens = quanteda::ntoken(manifestos_toks)))() |>
(\(df) transform(df, rel_imm = immigration/ntokens))() |>
_[, c("party", "immigration", "ntokens", "rel_imm")] |>
(\(df) `[`(df, order(df$rel_imm, decreasing = TRUE),))() party immigration ntokens rel_imm
text5 II 31 7295 0.0042494859
text7 PBP 26 11976 0.0021710087
text1 AO 53 27749 0.0019099787
text4 GR 24 29110 0.0008244589
text2 FF 24 33676 0.0007126737
text8 SD 36 58281 0.0006176970
text3 FG 32 52942 0.0006044350
text9 SF 28 48813 0.0005736177
text6 LAB 31 63107 0.0004912292