POP77142 Quantitative Text Analysis for Social Scientists
| Document | brown | cat | dog | fox | jumps | lazy | over | quick | the |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 2 |
| 2 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 2 |
for (i in seq_along(M)) {
w_i <- rmultinom(1, size = M[i], prob = mu)
txts[i] <- paste(unlist(Map(function(V, w_i) rep(V, w_i), V, w_i)), collapse = " ")
print(paste("Document", i, ":", txts[i]))
}[1] "Document 1 : cat dog"
[1] "Document 2 : cat dog dog"
[1] "Document 3 : cat cat cat"
[1] "Document 4 : cat dog dog fox fox"
Document-feature matrix of: 4 documents, 3 features (33.33% sparse) and 0 docvars.
features
docs cat dog fox
text1 1 1 0
text2 1 2 0
text3 3 0 0
text4 1 2 2
\[\mathbf{w}_1 = \left(1, 1, 0 \right)\quad\mathbf{w}_4 = \left(1, 2, 2 \right)\]
mu:\[P(\mathbf{w}_i \mid \mu) = \frac{M_i!}{\prod_{j=1}^J (w_{ij}!)} \prod_{j=1}^J \mu_j^{w_{ij}}\]
\[P(\mathbf{w}_i \mid \mu) = \frac{(1 + 1 + 2)!}{1! \times1! \times2!} \times \left[ 0.5^1 \times 0.25^1 \times 0.25^2 \right] = \frac{24}{2} \times 0.0078125 = 0.09375\]
| by | man | upon | |
|---|---|---|---|
| Hamilton | 859 | 102 | 374 |
| Jay | 82 | 0 | 1 |
| Madison | 474 | 17 | 7 |
| Unlabeled | 15 | 2 | 0 |
| by | man | upon | |
|---|---|---|---|
| Hamilton | 859 | 102 | 374 |
| Jay | 82 | 0 | 1 |
| Madison | 474 | 17 | 7 |
| Unlabeled | 15 | 2 | 0 |
\[P(\mathbf{w}_{disputed} | \hat{\mu}_{\mathcal{H}}) = \frac{17!}{(15!)(2!)(0!)} \left(0.64^{15} \times 0.08^{2} \times 0.28^{0} \right) = 0.001\]
\[P(\mathbf{w}_{disputed} | \hat{\mu}_{\mathcal{J}}) = \frac{17!}{(15!)(2!)(0!)} \left(0.99^{15} \times 0^{2} \times 0.01^{0} \right) = 0\]
\[P(\mathbf{w}_{disputed} | \hat{\mu}_{\mathcal{M}}) = \frac{17!}{(15!)(2!)(0!)} \left(0.95^{15} \times 0.035^{2} \times 0.015^{0} \right) = 0.077\]
The calculation clearly favours Madison as the author.
\[\mathbf{b} \sim Dir(\alpha) \quad \mathbf{b} = (p_1, p_2, \ldots, p_K) \quad \text{where} \quad \sum_{k=1}^{K} p_k = 1\]
topicmodels package. Text Types Tokens Sentences Year President FirstName
1 1789-Washington 625 1537 23 1789 Washington George
2 1793-Washington 96 147 4 1793 Washington George
3 1797-Adams 826 2577 37 1797 Adams John
4 1801-Jefferson 717 1923 41 1801 Jefferson Thomas
5 1805-Jefferson 804 2380 45 1805 Jefferson Thomas
6 1809-Madison 535 1261 21 1809 Madison James
Party
1 none
2 none
3 Federalist
4 Democratic-Republican
5 Democratic-Republican
6 Democratic-Republican
Formal class 'LDA_Gibbs' [package "topicmodels"] with 16 slots
..@ seedwords : NULL
..@ z : int [1:65652] 2 2 3 4 2 2 2 5 2 1 ...
..@ alpha : num 10
..@ call : language topicmodels::LDA(x = inaugural_dfm, k = 5, method = "Gibbs")
..@ Dim : int [1:2] 59 9209
..@ control :Formal class 'LDA_Gibbscontrol' [package "topicmodels"] with 14 slots
.. .. ..@ delta : num 0.1
.. .. ..@ iter : int 2000
.. .. ..@ thin : int 2000
.. .. ..@ burnin : int 0
.. .. ..@ initialize : chr "random"
.. .. ..@ alpha : num 10
.. .. ..@ seed : int NA
.. .. ..@ verbose : int 0
.. .. ..@ prefix : chr "/tmp/Rtmpfppzq2/file2fd60f572a4226"
.. .. ..@ save : int 0
.. .. ..@ nstart : int 1
.. .. ..@ best : logi TRUE
.. .. ..@ keep : int 0
.. .. ..@ estimate.beta: logi TRUE
..@ k : int 5
..@ terms : chr [1:9209] "fellow-citizens" "senate" "house" "representatives" ...
..@ documents : chr [1:59] "1789-Washington" "1793-Washington" "1797-Adams" "1801-Jefferson" ...
..@ beta : num [1:5, 1:9209] -11.75 -5.98 -11.78 -11.91 -8.87 ...
..@ gamma : num [1:59, 1:5] 0.114 0.179 0.119 0.14 0.118 ...
..@ wordassignments:List of 5
.. ..$ i : int [1:39894] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ j : int [1:39894] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ v : num [1:39894] 2 2 5 2 2 5 2 1 5 5 ...
.. ..$ nrow: int 59
.. ..$ ncol: int 9209
.. ..- attr(*, "class")= chr "simple_triplet_matrix"
..@ loglikelihood : num -485209
..@ iter : int 2000
..@ logLiks : num(0)
..@ n : int 65652
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
[1,] "world" "government" "government" "us" "country"
[2,] "peace" "people" "upon" "new" "every"
[3,] "can" "constitution" "shall" "let" "great"
[4,] "freedom" "states" "laws" "america" "united"
[5,] "must" "union" "people" "world" "public"
[6,] "life" "may" "congress" "people" "states"
[7,] "nations" "power" "law" "one" "war"
[8,] "justice" "one" "must" "can" "foreign"
[9,] "nation" "can" "now" "nation" "best"
[10,] "men" "upon" "public" "must" "us"
[11,] "free" "powers" "can" "time" "citizens"
[12,] "shall" "spirit" "service" "today" "duties"
[13,] "people" "rights" "made" "american" "good"
[14,] "human" "shall" "policy" "now" "just"
[15,] "faith" "executive" "great" "every" "without"
1 2 3 4 5
1789-Washington 0.11396011 0.3418803 0.10113960 0.07122507 0.3717949
1793-Washington 0.17857143 0.2500000 0.18750000 0.17857143 0.2053571
1797-Adams 0.11875000 0.3205357 0.10267857 0.05892857 0.3991071
1801-Jefferson 0.14020857 0.3140209 0.07184241 0.18539977 0.2885284
1805-Jefferson 0.11781338 0.2337418 0.11215834 0.06786051 0.4684260
1809-Madison 0.14756944 0.2673611 0.11111111 0.05902778 0.4149306
1813-Madison 0.17617450 0.2030201 0.13255034 0.07382550 0.4144295
1817-Monroe 0.04779640 0.2197393 0.11855990 0.03041589 0.5834885
1821-Monroe 0.02909796 0.1939864 0.11057226 0.02861300 0.6377304
1825-Adams 0.11024735 0.3279152 0.09752650 0.05795053 0.4063604
1829-Jackson 0.08762887 0.3092784 0.13058419 0.03951890 0.4329897
1833-Jackson 0.11519199 0.3672788 0.10350584 0.07178631 0.3422371
1837-VanBuren 0.09428130 0.3570325 0.11798042 0.05770222 0.3730036
1841-Harrison 0.04446178 0.6263651 0.10894436 0.03406136 0.1861674
1845-Polk 0.04453091 0.4803286 0.13143104 0.03977518 0.3039343