POP77032 Quantitative Text Analysis for Social Scientists
| Document | brown | cat | dog | fox | jumps | lazy | over | quick | the |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 2 |
| 2 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 2 |
[1] "Document 1 : cat dog"
[1] "Document 2 : cat dog dog"
[1] "Document 3 : cat cat cat"
[1] "Document 4 : cat dog dog fox fox"
Document-feature matrix of: 4 documents, 3 features (33.33% sparse) and 0 docvars.
features
docs cat dog fox
text1 1 1 0
text2 1 2 0
text3 3 0 0
text4 1 2 2
\[\mathbf{w}_1 = \left(1, 1, 0 \right)\quad\mathbf{w}_4 = \left(1, 2, 2 \right)\]
mu:\[P(\mathbf{w}_i \mid \mu) = \frac{M_i!}{\prod_{j=1}^J (w_{ij}!)} \prod_{j=1}^J \mu_j^{w_{ij}}\]
\[P(\mathbf{w}_i \mid \mu) = \frac{(1 + 1 + 2)!}{1! \times1! \times2!} \times \left[ 0.5^1 \times 0.25^1 \times 0.25^2 \right] = \frac{24}{2} \times 0.0078125 = 0.09375\]
| by | man | upon | |
|---|---|---|---|
| Hamilton | 859 | 102 | 374 |
| Jay | 82 | 0 | 1 |
| Madison | 474 | 17 | 7 |
| Unlabeled | 15 | 2 | 0 |
| by | man | upon | |
|---|---|---|---|
| Hamilton | 859 | 102 | 374 |
| Jay | 82 | 0 | 1 |
| Madison | 474 | 17 | 7 |
| Unlabeled | 15 | 2 | 0 |
\[P(\mathbf{w}_{disputed} | \hat{\mu}_{\mathcal{H}}) = \frac{17!}{(15!)(2!)(0!)} \left(0.64^{15} \times 0.08^{2} \times 0.28^{0} \right) = 0.001\]
\[P(\mathbf{w}_{disputed} | \hat{\mu}_{\mathcal{J}}) = \frac{17!}{(15!)(2!)(0!)} \left(0.99^{15} \times 0^{2} \times 0.01^{0} \right) = 0\]
\[P(\mathbf{w}_{disputed} | \hat{\mu}_{\mathcal{M}}) = \frac{17!}{(15!)(2!)(0!)} \left(0.95^{15} \times 0.035^{2} \times 0.015^{0} \right) = 0.077\]
The calculation clearly favours Madison as the author.
\[\mathbf{b} \sim Dir(\alpha) \quad \mathbf{b} = (p_1, p_2, \ldots, p_K) \quad \text{where} \quad \sum_{k=1}^{K} p_k = 1\]
topicmodels package. Text Types Tokens Sentences Year President FirstName
1 1789-Washington 625 1537 23 1789 Washington George
2 1793-Washington 96 147 4 1793 Washington George
3 1797-Adams 826 2577 37 1797 Adams John
4 1801-Jefferson 717 1923 41 1801 Jefferson Thomas
5 1805-Jefferson 804 2380 45 1805 Jefferson Thomas
6 1809-Madison 535 1261 21 1809 Madison James
Party
1 none
2 none
3 Federalist
4 Democratic-Republican
5 Democratic-Republican
6 Democratic-Republican
Formal class 'LDA_Gibbs' [package "topicmodels"] with 16 slots
..@ seedwords : NULL
..@ z : int [1:67100] 3 3 2 2 3 3 3 3 2 4 ...
..@ alpha : num 10
..@ call : language topicmodels::LDA(x = inaugural_dfm, k = 5, method = "Gibbs")
..@ Dim : int [1:2] 60 9360
..@ control :Formal class 'LDA_Gibbscontrol' [package "topicmodels"] with 14 slots
.. .. ..@ delta : num 0.1
.. .. ..@ iter : int 2000
.. .. ..@ thin : int 2000
.. .. ..@ burnin : int 0
.. .. ..@ initialize : chr "random"
.. .. ..@ alpha : num 10
.. .. ..@ seed : int NA
.. .. ..@ verbose : int 0
.. .. ..@ prefix : chr "/tmp/RtmpebPArD/filedba98123816e6"
.. .. ..@ save : int 0
.. .. ..@ nstart : int 1
.. .. ..@ best : logi TRUE
.. .. ..@ keep : int 0
.. .. ..@ estimate.beta: logi TRUE
..@ k : int 5
..@ terms : chr [1:9360] "fellow-citizens" "senate" "house" "representatives" ...
..@ documents : chr [1:60] "1789-Washington" "1793-Washington" "1797-Adams" "1801-Jefferson" ...
..@ beta : num [1:5, 1:9360] -11.95 -11.9 -5.91 -11.77 -11.86 ...
..@ gamma : num [1:60, 1:5] 0.0855 0.125 0.058 0.1645 0.0631 ...
..@ wordassignments:List of 5
.. ..$ i : int [1:40736] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ j : int [1:40736] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ v : num [1:40736] 3 3 2 3 3 3 2 4 2 3 ...
.. ..$ nrow: int 60
.. ..$ ncol: int 9360
.. ..- attr(*, "class")= chr "simple_triplet_matrix"
..@ loglikelihood : num -495966
..@ iter : int 2000
..@ logLiks : num(0)
..@ n : int 67100
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
[1,] "us" "every" "government" "world" "upon"
[2,] "america" "great" "constitution" "peace" "government"
[3,] "new" "united" "power" "can" "people"
[4,] "nation" "states" "people" "freedom" "laws"
[5,] "let" "public" "states" "must" "shall"
[6,] "one" "interests" "union" "life" "congress"
[7,] "time" "war" "can" "free" "law"
[8,] "must" "foreign" "may" "justice" "must"
[9,] "world" "citizens" "upon" "nations" "now"
[10,] "today" "may" "one" "men" "country"
[11,] "people" "without" "shall" "people" "policy"
[12,] "every" "time" "country" "nation" "public"
[13,] "now" "us" "spirit" "hope" "national"
[14,] "can" "government" "institutions" "shall" "american"
[15,] "american" "duties" "duty" "common" "service"
1 2 3 4 5
1789-Washington 0.08547009 0.3490028 0.3504274 0.04843305 0.16666667
1793-Washington 0.12500000 0.2589286 0.2410714 0.10714286 0.26785714
1797-Adams 0.05803571 0.3526786 0.3464286 0.12767857 0.11517857
1801-Jefferson 0.16454229 0.2676709 0.3082271 0.16106605 0.09849363
1805-Jefferson 0.06314797 0.4005655 0.2695570 0.13477851 0.13195099
1809-Madison 0.04861111 0.4670139 0.2743056 0.11631944 0.09375000
1813-Madison 0.07214765 0.3808725 0.2147651 0.16275168 0.16946309
1817-Monroe 0.03041589 0.6083178 0.2048417 0.02979516 0.12662942
1821-Monroe 0.03346266 0.6571290 0.1721629 0.02182347 0.11542192
1825-Adams 0.05583039 0.4070671 0.3010601 0.09540636 0.14063604
1829-Jackson 0.04639175 0.4054983 0.2955326 0.06529210 0.18728522
1833-Jackson 0.09515860 0.3222037 0.3839733 0.10183639 0.09682805
1837-VanBuren 0.05821741 0.3812468 0.3709428 0.06646059 0.12313241
1841-Harrison 0.05356214 0.1882475 0.6211648 0.02938118 0.10764431
1845-Polk 0.02723735 0.3052313 0.4651967 0.06485084 0.13748379