Week 3: Distributions & Sampling

POP88162 Introduction to Quantitative Research Methods

Working with Vectors

Let’s start by generating a random draw from a standard normal distribution

draws <- rnorm(100)

Check the top and bottom of this vector. Functions head() and tail() are applicable to different data structures in R. And printing out all 100 elements of this vector might be too verbose.

head(draws)
[1] -0.47099084  1.17485754  0.05432581  0.62012632  0.22090200  1.50798920
tail(draws)
[1] -1.1062816  0.5314540  0.8338968  0.6880475 -0.1950053  0.2690703

Let’s practice more vector subsetting. How many of the numbers in this random draw are larger than 1?

draws > 1
  [1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [13] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [25] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
 [37]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
 [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
 [73]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE
 [85] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE
draws[draws > 1]
 [1] 1.174858 1.507989 1.731882 1.993038 1.152477 1.185130 1.055924 1.592587
 [9] 1.794144 2.461424 1.764529 1.070837 1.951364 1.549949 1.487158 1.695294
[17] 1.373176 1.364248 1.012640 1.613589
length(draws[draws > 1])
[1] 20

What proportion of random numbers is smaller than -3?

Working with Distributions

Let’s plot the density of a standard normal distribution. First, let’s, replicate the plot from the workshop by using dnorm() function.

x <- seq(-5, 5, 0.1)
plot(x, dnorm(x), type = "l")

We can also get this plot by using random draws from above and built-in density() function in R. You can see that this plot is not nearly as pretty as the one above. This is due to a limited size of a random draw (only 100 observations). Try increasing the size of a random draw to check whether you can make this plot look closer to an ‘ideal’ normal distribution.

dens <- density(draws)
plot(dens)

Plot a normal distribution with mean 10 and standard deviation 3.

Calculating probability

Following the example from the workshop, calculate the probability of observing a value larger than 2 for a variable that has a standard normal distribution.

Sampling

Function sample() often comes in handy when we need to draw a random sample from a dataset or an individual variable.

# Here, we are drawing a random sample of 5 from the vector `draws` created above
sample(draws, size = 5)
[1] -0.97329047  0.70163229  0.04630917  0.20104785  0.80932383

Here is how we can draw a random sample from a data frame.

# Function with() tells R to obtain the variable names from `dens` object
# Otherwise, we would have needed to write its name twice and use $ subsetting
# as it is a list.
dd <- with(dens, data.frame(x, y))
# We are instructing R to draw a random sample of 10
# from the vector of row indices 1:nrow(dd)
dd[sample(1:nrow(dd), 10),]
             x           y
108 -1.7837622 0.066271437
321  0.9944541 0.321707079
416  2.2335647 0.053260157
480  3.0683339 0.002577327
426  2.3639974 0.036583386
335  1.1770599 0.260614418
202 -0.5576949 0.245482664
430  2.4161705 0.031333374
68  -2.3054930 0.026434727
487  3.1596368 0.001511546

Read in democracy_2020.csv dataset. Draw a random sample of 50 regimes.