POP77032 Quantitative Text Analysis for Social Scientists
\[ Y = f(X_1, X_2) = \begin{cases} 1 & \text{if } X_1 = 1 \text{ and } X_2 = 1 \\ 0 & \text{otherwise} \end{cases} \]
| \(X_1\) | \(X_2\) | \(Y = X_1 \text{ AND } X_2\) |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
1 2 3 4
-0.25 0.25 0.25 0.75
ifelse() is, effectively, the simplest possible activation function.\[ Y = f(X_1, X_2) = \begin{cases} 1 & \text{if } (X_1 = 1 \text{ and } X_2 = 0) \text{ or } (X_1 = 0 \text{ and } X_2 = 1) \\ 0 & \text{otherwise} \end{cases} \]
| \(X_1\) | \(X_2\) | \(Y = X_1 \text{ XOR } X_2\) |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Call:
lm(formula = Y ~ X1 + X2, data = xor_data)
Residuals:
1 2 3 4
-0.5 0.5 0.5 -0.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.00e-01 8.66e-01 0.577 0.667
X1 1.11e-16 1.00e+00 0.000 1.000
X2 0.00e+00 1.00e+00 0.000 1.000
Residual standard error: 1 on 1 degrees of freedom
Multiple R-squared: 3.698e-32, Adjusted R-squared: -2
F-statistic: 1.849e-32 on 2 and 1 DF, p-value: 1
Call:
glm(formula = Y ~ X1 + X2, family = binomial(link = "logit"),
data = xor_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.441e-16 1.732e+00 0 1
X1 4.441e-16 2.000e+00 0 1
X2 8.882e-16 2.000e+00 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5.5452 on 3 degrees of freedom
Residual deviance: 5.5452 on 1 degrees of freedom
AIC: 11.545
Number of Fisher Scoring iterations: 2
https://tpaskhalis-gradient-descent.share.connect.posit.cloud/#/gradient-descent-for-1-variable
https://tpaskhalis-gradient-descent.share.connect.posit.cloud/#/gradient-descent-for-2-variables
np.linalg.inv(X.T @ X) can, however, be computationally expensive for large \(N\).betas = np.array([0.0, 0.0]) # initial weights
lr = 0.01 # learning rate
batch_size = 20
for epoch in range(1000):
# Sample a random minibatch of data
indices = np.random.choice(len(y), size=batch_size, replace=False)
X_batch, y_batch = X[indices], y[indices]
y_pred = X_batch @ betas
residuals = y_batch - y_pred
gradient = -2 * X_batch.T @ residuals / batch_size
betas = betas - lr * gradient
betasarray([ 1.49090973, -2.06441368])
def step(z):
return np.where(z >= 0, 1.0, 0.0)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 0, 0, 1])
params = np.zeros(3) # initial weights (w1, w2, b)
lr = 0.1 # learning rate
batch_size = 10
for epoch in range(1000):
indices = np.random.choice(len(y), size=batch_size, replace=True)
X_batch, y_batch = X[indices], y[indices]
z = X_batch @ params[:2] + params[2] # linear step
y_pred = step(z) # step activation
residuals = y_batch - y_pred # perceptron error (0 or ±1)
gradient_w = -2 * X_batch.T @ residuals / batch_size
gradient_b = -2 * np.sum(residuals) / batch_size
params[:2] = params[:2] - lr * gradient_w
params[2] = params[2] - lr * gradient_b
paramsarray([ 0.12, 0.14, -0.16])
# Rectified Linear Unit (ReLU)
def relu(z):
return np.maximum(0, z)
# Sigmoid activation functions
def sigmoid(z):
return 1 / (1 + np.exp(-z))
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 0])
rng = np.random.default_rng(1)
W1 = rng.normal(scale=0.5, size=(2, 2)) # hidden weights (2×2)
b1 = np.zeros(2) # hidden biases
w2 = rng.normal(scale=0.5, size=2) # output weights
b2 = 0.0 # output bias
lr = 0.1
batch_size = 10
for epoch in range(1000):
indices = np.random.choice(len(y), size=batch_size, replace=True)
X_batch, y_batch = X[indices], y[indices]
z1 = X_batch @ W1 + b1 # hidden pre-activation
h = relu(z1) # hidden layer (ReLU)
y_pred = sigmoid(h @ w2 + b2) # output (sigmoid)
residuals = y_batch - y_pred
sig_d = y_pred * (1 - y_pred) # sigmoid derivative
delta = np.outer(residuals * sig_d, w2) * (z1 >= 0) # backpropagation
gradient_W1 = -2 * X_batch.T @ delta / batch_size
gradient_b1 = -2 * np.sum(delta, axis=0) / batch_size
gradient_w2 = -2 * h.T @ (residuals * sig_d) / batch_size
gradient_b2 = -2 * np.sum(residuals * sig_d) / batch_size
W1 -= lr * gradient_W1; b1 -= lr * gradient_b1
w2 -= lr * gradient_w2; b2 -= lr * gradient_b2array([[ 3.34418151e-03, -1.10023276e-03],
[ 1.50855371e+00, -1.63144001e+00],
[-1.49960507e+00, 1.63064791e+00],
[ 5.60445195e-03, 3.08134648e-04]])
array([[3.34418151e-03, 0.00000000e+00],
[1.50855371e+00, 0.00000000e+00],
[0.00000000e+00, 1.63064791e+00],
[5.60445195e-03, 3.08134648e-04]])