POP88162 Introduction to Quantitative Research Methods
Department of Political Science, Trinity College Dublin
\[ \begin{align*} &\sum_{i = 1}^{n} (Y_i - \bar{Y})^2& &=& &\sum_{i = 1}^{n} (Y_i - \hat{Y_i})^2& &+& &\sum_{i = 1}^{n} (\hat{Y_i} - \bar{Y})^2& \\ &TSS& &=& &SSE& &+& &ESS& \end{align*} \]
where:
\[Y_i = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \ldots + \beta_k X_{ki} + \epsilon_i\] where:
\[ \begin{align*} M_0: Y_i &= \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \ldots + \beta_{k-1} X_{k-1i} + \epsilon_i \\ M_a: Y_i &= \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \ldots + \beta_{k-1} X_{k-1i} +\beta_k X_{ki} + \epsilon_i \end{align*} \]
\[ \begin{align*} F &= \frac{(SSE_0 - SSE_a)/(k_a - k_0)}{SSE_a/(n - (k_a + 1))} \\ &= \frac{(R^2_a - R^2_0)/(k_a - k_0)}{(1 - R^2_a)/(n - (k_a + 1))} \\ &= \frac{R^2_{change}/df_{change}}{(1 - R^2_a)/(n - (k_a + 1))} \end{align*} \]
where:
\[log(GDP)_i = \alpha + \beta_1 Democracy_i + \epsilon_i\]
Call:
lm(formula = log(gdp_per_capita) ~ democracy, data = democracy_gdp_2020)
Residuals:
Min 1Q Median 3Q Max
-2.9242 -0.8475 -0.1055 0.9133 3.0186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.9801 0.1564 51.039 < 2e-16 ***
democracy 1.1000 0.1990 5.527 1.18e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.28 on 173 degrees of freedom
(20 observations deleted due to missingness)
Multiple R-squared: 0.1501, Adjusted R-squared: 0.1452
F-statistic: 30.54 on 1 and 173 DF, p-value: 1.183e-07
\[log(GDP)_i = \alpha + \beta_1 Democracy_i + \beta_2 log(Longevity)_i + \epsilon_i\]
lm_fit_2 <- lm(log(gdp_per_capita) ~ democracy + log(democracy_duration), data = democracy_gdp_2020)
summary(lm_fit_2)
Call:
lm(formula = log(gdp_per_capita) ~ democracy + log(democracy_duration),
data = democracy_gdp_2020)
Residuals:
Min 1Q Median 3Q Max
-2.85521 -0.87765 -0.07444 0.82037 3.10558
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.71571 0.34602 16.519 < 2e-16 ***
democracy 1.17576 0.17567 6.693 2.96e-10 ***
log(democracy_duration) 0.62745 0.08795 7.134 2.60e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.128 on 172 degrees of freedom
(20 observations deleted due to missingness)
Multiple R-squared: 0.3441, Adjusted R-squared: 0.3365
F-statistic: 45.12 on 2 and 172 DF, p-value: < 2.2e-16
\[ \begin{align*} F &= \frac{(R^2_a - R^2_0)/(k_a - k_0)}{(1 - R^2_a)/(n - (k_a + 1))} \\ &= \frac{(0.344 - 0.15)/(2 - 1)}{(1 - 0.344)/(175 - (2 + 1))} \\ &= \frac{0.194}{0.004} = 50.87 \end{align*} \]
anova() command:Analysis of Variance Table
Model 1: log(gdp_per_capita) ~ democracy
Model 2: log(gdp_per_capita) ~ democracy + log(democracy_duration)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 173 283.36
2 172 218.66 1 64.7 50.893 2.603e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Does democratization increase education provision?
| Country | \(X_{region}\) |
|---|---|
| Afghanistan | Asia |
| Albania | EE |
| Algeria | MENA |
| Argentina | LA |
| Australia | Advanced |
| \(\vdots\) | \(\vdots\) |
| Country | \(X_{region}\) |
|---|---|
| Afghanistan | 2 |
| Albania | 3 |
| Algeria | 5 |
| Argentina | 4 |
| Australia | 1 |
| \(\vdots\) | \(\vdots\) |
| Country | \(X_{Asia}\) | \(X_{EE}\) | \(X_{LA}\) | \(X_{MENA}\) | \(X_{Sub-Saharan}\) |
|---|---|---|---|---|---|
| Afghanistan | 1 | 0 | 0 | 0 | 0 |
| Albania | 0 | 1 | 0 | 0 | 0 |
| Algeria | 0 | 0 | 0 | 1 | 0 |
| Argentina | 0 | 0 | 1 | 0 | 0 |
| Australia | 0 | 0 | 0 | 0 | 0 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
Advanced Economies Asia and the Pacific
936 624
Eastern Europe Latin America and the Caribbean
312 975
Middle East and North Africa Sub-Saharan Africa
507 897
(Intercept) regionAsia and the Pacific regionEastern Europe
3549 1 1 0
1716 1 0 1
3081 1 0 0
1014 1 0 0
4173 1 0 0
1521 1 0 0
regionLatin America and the Caribbean regionMiddle East and North Africa
3549 0 0
1716 0 0
3081 0 1
1014 1 0
4173 0 0
1521 0 0
regionSub-Saharan Africa
3549 0
1716 0
3081 0
1014 0
4173 0
1521 0
Call:
lm(formula = primary_ser ~ democracy + region, data = paglayan2021_2010)
Residuals:
Min 1Q Median 3Q Max
-28.6254 -0.9387 0.8562 1.7045 10.9446
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 96.0829 2.2650 42.421 < 2e-16 ***
democracy 3.0921 1.7851 1.732 0.08636 .
regionAsia and the Pacific -0.8279 2.4412 -0.339 0.73524
regionEastern Europe -0.8918 2.9448 -0.303 0.76264
regionLatin America and the Caribbean -0.8423 1.9634 -0.429 0.66884
regionMiddle East and North Africa 2.4460 2.8361 0.862 0.39053
regionSub-Saharan Africa -7.0275 2.1680 -3.241 0.00162 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.83 on 99 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.2113, Adjusted R-squared: 0.1635
F-statistic: 4.421 on 6 and 99 DF, p-value: 0.0005304
| Country | \(Y_{SER}\) | \(X_{year}\) | \(X_{region}\) | \(X_{democracy}\) |
|---|---|---|---|---|
| Afghanistan | 0 | 1820 | 2 | 0 |
| Afghanistan | 0 | 1825 | 2 | 0 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| Albania | 0.15 | 1820 | 3 | NA |
| Albania | 0.19 | 1825 | 3 | NA |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
As a first idea, we can just treat each country-year as a distinct data point.
This is called a pooled model: we treat all the country-years into a common pool of data points.
This specification would be, essentially, equivalent to the one we used before: \[SER_i = \alpha + \beta_1 Democracy_i + \beta_2 Region_i + \epsilon_i\]
Call:
lm(formula = primary_ser ~ democracy + region, data = paglayan2021)
Residuals:
Min 1Q Median 3Q Max
-86.554 -22.469 2.598 19.613 65.030
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.123 1.351 37.847 < 2e-16 ***
democracy 41.291 1.351 30.557 < 2e-16 ***
regionAsia and the Pacific -12.164 2.053 -5.925 3.55e-09 ***
regionEastern Europe 9.928 2.503 3.966 7.51e-05 ***
regionLatin America and the Caribbean -16.153 1.567 -10.311 < 2e-16 ***
regionMiddle East and North Africa 1.326 2.393 0.554 0.580
regionSub-Saharan Africa -3.063 2.143 -1.429 0.153
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 29.14 on 2489 degrees of freedom
(1755 observations deleted due to missingness)
Multiple R-squared: 0.3713, Adjusted R-squared: 0.3697
F-statistic: 245 on 6 and 2489 DF, p-value: < 2.2e-16
| Pooled | FEs | |
|---|---|---|
| (Intercept) | 51.123 | 30.215 |
| (1.351) | (4.211) | |
| democracy | 41.291 | 9.874 |
| (1.351) | (1.143) | |
| regionAsia and the Pacific | −12.164 | −34.053 |
| (2.053) | (1.510) | |
| regionEastern Europe | 9.928 | −9.723 |
| (2.503) | (1.802) | |
| regionLatin America and the Caribbean | −16.153 | −28.349 |
| (1.567) | (1.128) | |
| regionMiddle East and North Africa | 1.326 | −31.701 |
| (2.393) | (1.807) | |
| regionSub-Saharan Africa | −3.063 | −43.322 |
| (2.143) | (1.717) | |
| Year | No | Yes |
| Num.Obs. | 2496 | 2496 |
| R2 | 0.371 | 0.696 |
| R2 Adj. | 0.370 | 0.690 |
| Standard errors in parentheses |
\[Y_i = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \epsilon_i\]
\[Y_i = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{1i} X_{2i}+ \epsilon_i\]
\[log(GDP)_i = \alpha + \beta_1 Democracy_i + \beta_2 log(Longevity)_i + \beta_3 Democracy_i \times log(Longevity)_i + \epsilon_i\]
lm_fit_6 <- lm(log(gdp_per_capita) ~ democracy * log(democracy_duration), data = democracy_gdp_2020)
summary(lm_fit_6)
Call:
lm(formula = log(gdp_per_capita) ~ democracy * log(democracy_duration),
data = democracy_gdp_2020)
Residuals:
Min 1Q Median 3Q Max
-2.4386 -0.7792 0.0015 0.7448 3.1699
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.0479 0.4590 15.355 < 2e-16 ***
democracy -1.3234 0.6206 -2.133 0.0344 *
log(democracy_duration) 0.2583 0.1218 2.120 0.0355 *
democracy:log(democracy_duration) 0.7037 0.1682 4.183 4.59e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.077 on 171 degrees of freedom
(20 observations deleted due to missingness)
Multiple R-squared: 0.405, Adjusted R-squared: 0.3946
F-statistic: 38.8 on 3 and 171 DF, p-value: < 2.2e-16
plot(log(democracy_gdp_2020$democracy_duration), log(democracy_gdp_2020$gdp_per_capita),
xlab = "Duration of Political Regime (log)", ylab = "GDP per capita (log)",
pch = 19, col = democracy_gdp_2020$democracy + 1
)
abline(a = coef(lm_fit_2)[1], b = coef(lm_fit_2)[3], col = 1, lty = 2)
abline(a = coef(lm_fit_2)[1] + coef(lm_fit_2)[2], b = coef(lm_fit_2)[3], col = 2, lty = 2)
abline(a = coef(lm_fit_6)[1], b = coef(lm_fit_6)[3], col = 1)
abline(a = coef(lm_fit_6)[1] + coef(lm_fit_6)[2], b = coef(lm_fit_6)[3] + coef(lm_fit_6)[4], col = 2)
legend(
x = "topleft",
legend = c("Non-democracy", "Non-democracy w/ Int.", "Democracy", "Democracy w/ Int."),
lty = c(2, 1, 2, 1),
col = c(1, 1, 2, 2)
)