minwage <- read.csv("../data/minwage.csv")Week 11: Causation
POP88162 Introduction to Quantitative Research Methods
Summarising Data
Read in the data for the minimum wage study by Card and Krueger (1994). You can find this dataset called minwage.csv on Blackboard.
Let’s start by conducting the usual checks of dataset’s dimensionality, structure and distributions of variables.
str(minwage)'data.frame': 358 obs. of 8 variables:
$ chain : chr "wendys" "wendys" "burgerking" "burgerking" ...
$ location : chr "PA" "PA" "PA" "PA" ...
$ wageBefore: num 5 5.5 5 5 5.25 5 5 5 5 5.5 ...
$ wageAfter : num 5.25 4.75 4.75 5 5 5 4.75 5 4.5 4.75 ...
$ fullBefore: num 20 6 50 10 2 2 2.5 40 8 10.5 ...
$ fullAfter : num 0 28 15 26 3 2 1 9 7 18 ...
$ partBefore: num 20 26 35 17 8 10 20 30 27 30 ...
$ partAfter : num 36 3 18 9 12 9 25 32 39 10 ...
summary(minwage) chain location wageBefore wageAfter
Length:358 Length:358 Min. :4.250 Min. :4.250
Class :character Class :character 1st Qu.:4.250 1st Qu.:5.050
Mode :character Mode :character Median :4.500 Median :5.050
Mean :4.618 Mean :4.994
3rd Qu.:4.987 3rd Qu.:5.050
Max. :5.750 Max. :6.250
fullBefore fullAfter partBefore partAfter
Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.00
1st Qu.: 2.125 1st Qu.: 2.000 1st Qu.:11.00 1st Qu.:11.00
Median : 6.000 Median : 6.000 Median :16.25 Median :17.00
Mean : 8.475 Mean : 8.362 Mean :18.75 Mean :18.69
3rd Qu.:12.000 3rd Qu.:12.000 3rd Qu.:25.00 3rd Qu.:25.00
Max. :60.000 Max. :40.000 Max. :60.00 Max. :60.00
Subsetting Data
To simplify the ensuing analysis we will create two separate data frames: one, containing fast-food restaurants in New Jersey and another one with restaurants in Pennsylvania.
First, note how location is coded in the dataset. We have only one state name abbreviation for Pennsylvania (PA), but multiple ones for different parts of New Jersey (e.g. northNJ, shoreNJ, etc.)
table(minwage$location)
centralNJ northNJ PA shoreNJ southNJ
45 146 67 33 67
To split up the data into two data data frames for each state we can use already familiar subsetting operations or, like here, a function subset().
minwageNJ <- subset(minwage, subset = (location != "PA"))
minwagePA <- subset(minwage, subset = (location == "PA"))These two subset() function calls correspond to these subsetting operations:
minwageNJ <- minwage[minwage$location != "PA",]
minwagePA <- minwage[minwage$location == "PA",]As a first substantive data check let’s start by examining what proportion of fast-food restaurants pay more than \(\$5.05\) before and after the introduction of new minimum wage set at this level in NJ.
# NJ before
mean(minwageNJ$wageBefore < 5.05)[1] 0.9106529
# NJ after
mean(minwageNJ$wageAfter < 5.05)[1] 0.003436426
# PA before
mean(minwagePA$wageBefore < 5.05)[1] 0.9402985
# PA after
mean(minwagePA$wageAfter < 5.05)[1] 0.9552239
Difference-in-means Analysis
Let’s start our analysis by doing a simple difference in means comparison between NJ and PA after the introduction of new minimumwage in NJ.
First, we will create a new variable fte_prop_after, which will indicate the proportion of full-time employers after the change.
minwageNJ$fte_prop_after <- minwageNJ$fullAfter/(minwageNJ$fullAfter + minwageNJ$partAfter)
minwagePA$fte_prop_after <- minwagePA$fullAfter/(minwagePA$fullAfter + minwagePA$partAfter)We can now proceed to calculating the difference in means.
mean(minwageNJ$fte_prop_after) - mean(minwagePA$fte_prop_after)[1] 0.04811886
And conducting a statistical test about this difference.
t.test(minwageNJ$fte_prop_after, minwagePA$fte_prop_after)
Welch Two Sample t-test
data: minwageNJ$fte_prop_after and minwagePA$fte_prop_after
t = 1.4322, df = 99.761, p-value = 0.1552
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.01854186 0.11477959
sample estimates:
mean of x mean of y
0.3204010 0.2722821
What is your substantive conclusion given this output?
Now, instead of testing this relationship using a t-test, fit a linear regression model to calculate the difference. Instead of working with two separate datasets for NJ and PA, for this task you might to want to modify the full minwage dataset.
Before-and-after Analysis
As we discussed in the lecture, rather than comparing post-change restaurants in NJ to their counterparts in PA, we might instead compare restaurants in NJ to themselves prior to the change in minimum wage.
First, let’s create a new variable, which would capture the propotion of full-time employers in each fast-food restaurant in our dataset prior to the change.
minwageNJ$fte_prop_before <- minwageNJ$fullBefore/(minwageNJ$fullBefore + minwageNJ$partBefore)We can now calculate the difference in means before and after the new law.
mean(minwageNJ$fte_prop_after) - mean(minwageNJ$fte_prop_before)[1] 0.02387474
And, as usually, run a statistical test on this difference.
t.test(minwageNJ$fte_prop_after, minwageNJ$fte_prop_before)
Welch Two Sample t-test
data: minwageNJ$fte_prop_after and minwageNJ$fte_prop_before
t = 1.1952, df = 575.82, p-value = 0.2325
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.01535869 0.06310817
sample estimates:
mean of x mean of y
0.3204010 0.2965262
Difference-in-difference (DiD) Analysis
Finally, let’s conduct difference-in-difference analysis as discussed in the lecture. Recall that relative to before-and-after design it allows to address the confounding bias due to time trend and relative to simple difference-in-means between the two states it can (at least partially) address state-specific confounding.
Note that for DiD design we need two differences (hence, DiD!). First, we need to calculate before and after difference in one state, say, NJ.
NJdiff <- mean(minwageNJ$fte_prop_after) - mean(minwageNJ$fte_prop_before)Next, we need to repeat the same for the other group, namely, fast-food restaurants in PA.
minwagePA$fte_prop_before <- minwagePA$fullBefore/(minwagePA$fullBefore + minwagePA$partBefore)
PAdiff <- mean(minwagePA$fte_prop_after) - mean(minwagePA$fte_prop_before)And, finally, we can calculate our difference-in-difference estimate.
NJdiff - PAdiff[1] 0.06155831
Equivalently, we can also test the significance of this effect with a t-test.
t.test(
minwageNJ$fte_prop_after - minwageNJ$fte_prop_before,
minwagePA$fte_prop_after - minwagePA$fte_prop_before
)
Welch Two Sample t-test
data: minwageNJ$fte_prop_after - minwageNJ$fte_prop_before and minwagePA$fte_prop_after - minwagePA$fte_prop_before
t = 1.3526, df = 90.777, p-value = 0.1796
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.02884997 0.15196659
sample estimates:
mean of x mean of y
0.02387474 -0.03768357