Week 6: Visualisations

POP88162 Introduction to Quantitative Research Methods

Tom Paskhalis

Department of Political Science, Trinity College Dublin

So Far

  • Vector is the core data structure in R.
  • Vectors can be one of the main data types (character, numeric, logical).
  • As opposed to homogeneous vectors, lists allow to combine data of different types.
  • Matrices are vectors with an added class and dimensionality attribute.
  • Data frames are lists of equal-sized vectors.
  • When subsetting data frames the techniques of subsetting matrices and lists are combined.
  • Factor variables can be used to represent categorical data in R.

Topics for Today

  • plot() function
  • Box plot
  • Good plotting practices
  • ggplot2 package

Plotting in R

  • The main function for plotting in base R is plot().
  • It is a flexible function that relies on the data type of supplied objects.
plot(
  x,
  y,
  type,
  main,
  xlab,
  ylab
)

Scatterplot

plot(democracy_gdp_2020$democracy, democracy_gdp_2020$democracy_duration)

Prettifying Scatter Plot

plot(
  x = democracy_gdp_2020$democracy,
  y = democracy_gdp_2020$democracy_duration, 
  xlab = "Political Regime", 
  ylab = "Regime Longevity", 
  main = "Democracy in 2020",
  pch = 19, # Solid points
  cex = 0.5, # Smaller points
  bty = "n" # Remove surrounding box
)

Prettifying Scatter Plot

Box Plot Example

# Convert democracy indicator into factor
democracy_gdp_2020$democracy <- factor(democracy_gdp_2020$democracy)
plot(democracy_gdp_2020$democracy, democracy_gdp_2020$democracy_duration)

Comparing Pizzas

Pie Charts

  • While broadly popular, pie charts are generally not a good way of presenting information.

Playfair (1801)

Pie Charts in R

pie(table(democracy_gdp_2020$democracy))

Use Bar Plot Instead

barplot(table(democracy_gdp_2020$democracy))

Beyond Base R

  • While extremely powerful, base R plotting facilities are not as flexible as some external libraries.
  • ggplot2 is particularly popular for data visualisations.
  • It is based on the Grammar of Graphics plotting approach:
    • Graphs are broken into multiple layers
    • Layers can be recycled across multiple plots
  • As an external library ggplot2 needs to be installed and loaded before using:
install.packages("ggplot2")
library("ggplot2")

Structure of ggplot calls

  • Creation of ggplot objects in plotline has the following structure:
ggplot(data = <data>) +
    <geom_function>(mapping = aes(<mappings>))
  • If the mappings are re-used across geometric objects (e.g. scatter plot and line):
ggplot(data = <data>, mapping = aes(<mappings>)) +
    <geom_function>() +
    <geom_function>()

ggplot Example

democracy_plot <- 
  ggplot(democracy_gdp_2020, aes(x = log(democracy_duration), y = log(gdp_per_capita))) +
  geom_point()
democracy_plot

Adding Layers

# Re-use previously saved plot
democracy_plot +
  # Add a layer with linear regression line
  geom_smooth(method = lm, se = FALSE)

Prettifying ggplot

ggplot(
  democracy_gdp_2020,
  aes(x = log(democracy_duration), y = log(gdp_per_capita))
) +
  geom_point(
    aes(colour = factor(democracy, labels = c("Autocracy", "Democracy")))
  ) +
  geom_smooth(method = lm, se = FALSE) +
  labs(
    x = "Duration of Political Regime (log)",
    y = "GDP per capita (log)",
    colour = "Political Regime"
  ) +
  theme_classic() +
  theme(
    legend.box.background = element_rect(size = 0.5),
    legend.position = c(.2, .8)
  )

Prettifying ggplot

Available Geometric Objects

Method Description
geom_bar(), geom_col() Bar charts
geom_boxplot() Box and whisker plot
geom_histogram() Histogram
geom_point() Scatterplot
geom_line(), geom_path() Lines
geom_map() Geographic areas
geom_smooth() Smoothed conditional means
geom_violin() Violin plots

Next

  • Tutorial:
    • Correlation
  • Next week:
    • Reading week
  • After reading week:
    • RQ Presentations