Week 3: Probability Theory

POP88162 Introduction to Quantitative Research Methods

Tom Paskhalis

Department of Political Science, Trinity College Dublin

So Far

Quantitative research involves collecting data - a sample of observations selected from a larger population, in which one or more variables are measured for each observation.
The goal of collecting data is usually to calculate statistics which can be used to infer parameters of a population.
Variables can be measure on different scales, which determine which statistics are applicable.
Measures of central tendency describe a typical observation.
Measures of variability describe the spread of the variable.

Topics for Today

Probability
Random variables
Probability distributions
Normal distribution

Today’s Plan

graph LR
    A(Outcomes &<br>Events) --> B(Probability) --> C(Random<br>Variables) --> D(Probability<br>Distributions)

Probability

Why Probability?

Probability describes the uncertainty about our sample.
Inference (i.e. statistical inference) allows us to make conclusions about the population.

flowchart LR
    A((Population))
    B((Sample))
    A-- Probability -->B
    B-- Inference -->A

Origins of Probability Theory

Georges de La Tour, Louvre

Example: Sortition

Imagine a world in which political candidates are selected by lot (sortition).

Three Parties:

Left Party 🛠️
Right Party 🏦
Green Party 🌳

Candidates can be of two genders:

Female ♀️
Male ♂️

Six possible candidates:

👩🏻‍🔧 (♀️🛠️)
👨🏿‍⚕️ (♂️️🛠️)

🧑🏾‍⚖️(♀️🏦)
🧑🏻‍💼 (♂️🏦)

👩🏽‍🌾 (♀️🌳)
👨🏿‍🎨 (♂️🌳)

Example Continued: Sortition

Hypothetical trial: roll a 🎲 to pick a candidate.
Uncertainty: we don’t know which candidate will be selected.
One possible outcome:

picking 👩🏽‍🌾 (♀️🌳)
Sample space \(S\):

{👩🏻‍🔧, 👨🏿‍⚕️, 🧑🏾‍⚖️, 🧑🏻‍💼, 👩🏽‍🌾, 👨🏿‍🎨}
An event \(A\):

selecting a ♀️
Any of these outcomes would make us say that an event \(A\) has occurred:

{👩🏻‍🔧, 🧑🏾‍⚖️, 👩🏽‍🌾}

What is Probability?

Probability \(P(A)\) represents how likely is an event \(A\) to occur.
If all outcomes are equally likely:

\[P(A) = \frac{\text{Number of elements in A}}{\text{Number of elements in }S}\]

Sortition example:
- Probability of selecting ♀️: \(\frac{3}{6} = \frac{1}{2}\)
- Probability of selecting 🏦: \(\frac{2}{6} = \frac{1}{3}\)

The Basics of Probability

Probability is a property of events.
The same event can occur when different outcomes are observed.
One outcome is a draw from all possible outcomes (sample space).
We are not interested in events per se, but the properties of the events.
The more individual events we observe, the closer our estimates are to population parameters.

Approaches to Probability

Frequentist: long-run frequency over a large number of repeated events.
Bayesian: degree of belief about the event in question.
In frequentist view population parameters are fixed (but unknown).
In Bayesian view population parameters are themselves random variables.
In the rest of this class we will focus only on frequentist approach.

Extra

Statistical Rethinking by Richard McElreath

Frequentists vs Bayesians

Extra

Critique of this cartoon by Andrew Gelman

Probability Axioms

Probabilities are always non-negative:
- \(P(A) \ge 0\) for any event \(A\)
Probabilities of all possible outcomes add up to \(1\):
- \(P(S) = 1\)
If two events \(A\) and \(B\) are mutually exclusive:
- \(P(A\text{ or }B) = P(A) + P(B)\)

Some Properties of Probability

Probability of the complement:
- \(P(A^{c}) = P(\text{not } A) = 1 - P(A)\)
- E.g. \(P(\text{not 🏦}) = 1 - P(🏦) = 1 - \frac{1}{3} = \frac{2}{3}\)
- “Probability of not selecting a candidate from the Right Party is \(\frac{2}{3}\)”
General addition rule:
- \(P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and }B)\)
- E.g. \(P(\text{♀️ or 🌳}) = P(♀️) + P(️🌳) - P(\text{♀️ and ️🌳}) = \frac{1}{2} + \frac{1}{3} - \frac{1}{6} = \frac{3 + 2 - 1}{6} = \frac{4}{6} = \frac{2}{3}\)
- “Probability of selecting a woman or a Green Party candidate is \(\frac{2}{3}\)”

Non-Naive Definition of Probability

The definition of probability from above:

\[P(A) = \frac{\text{Number of elements in A}}{\text{Number of elements in }S}\]

is actually rather naive.

There two big problems with it:
- All outcomes are assumed to be equally likely.
- All outcomes have to be listed.
More generally, we can call probability any function that maps events to a real number between \(0\) and \(1\).

Random Variables

How do we map the possible outcomes of sortition to numbers in our data?
Using random variables.
Consider the sample space:

{👩🏻‍🔧, 👨🏿‍⚕️, 🧑🏾‍⚖️, 🧑🏻‍💼, 👩🏽‍🌾, 👨🏿‍🎨}
Let \(Y\) be the selection of a ♀️ candidate:

Y(👩🏻‍🔧) = Y(🧑🏾‍⚖️) = Y(👩🏽‍🌾) = 1

Y(👨🏿‍⚕️) = Y(🧑🏻‍💼) = Y(👨🏿‍🎨) = 0
These 0’s and 1’s are what we actually see in our data.
In other words, random variable \(Y\) provides the numerical summary of the candidate draw with our question (selection of a ♀️ candidate) in mind.
The source of randomness is that we don’t know which candidate will be selected.

Random Variables Continued

Imagine that instead of being interested of the selection of a ♀️ candidate we are interested in selection of a ️🌳 candidate.
We have the same sample space: {👩🏻‍🔧, 👨🏿‍⚕️, 🧑🏾‍⚖️, 🧑🏻‍💼, 👩🏽‍🌾, 👨🏿‍🎨}
But another random variable \(X\) maps the same outcomes differently than \(Y\):

X(👩🏻‍🔧) = X(🧑🏾‍⚖️) = X(👨🏿‍⚕️) = X(🧑🏻‍💼) = 0

X(👩🏽‍🌾) = X(👨🏿‍🎨) = 1
Alternatively, rather than focussing on ️🌳, we may choose the variable \(X\) to map the selection of a candidate from any party such that:

X(👩🏻‍🔧) = X(👨🏿‍⚕️) = 1 for 🛠️

X(🧑🏾‍⚖️) = X(🧑🏻‍💼) = 2 for 🏦

X(👩🏽‍🌾) = X(👨🏿‍🎨) = 3 for 🌳
Note that since these all are categorical variables the actual numbers assigned by random variables are somewhat arbitrary.

Discrete and Continuous Random Variables

Imagine we observe one event.
We want to know what is the probability that the random variable associated with this event takes on a certain value.
But that depends on what are the potential values that this random variable can take.
This hinges on whether the measurement scale is discrete or continuous.
Probability works slightly differently for them.

Example: Discrete Random Variable

Discrete Random Variables

A random variable that takes on a countable number of values.
Describes the data measured on nominal and ordinal scales.
Probability distribution of a discrete random variable assigns probability to each possible value of the variable.
- E.g. \(P(️🌳) = \frac{1}{3}\)
We can write out all the individual probabilities for such variables:

Party	\(P(Y)\)
🛠️	0.33
🏦	0.33
🌳	0.33

Example: Continuous Random Variable

Continuous Random Variables

What is the probability that someone’s income is exactly \(£39,674.39\)?
For specific values it’s always \(0\).
Continuous random variables take an infinite number of possible values.
Probability distribution for continuous variables assigns probabilities for intervals.
So, we can calculate the probability that someone’s income, for example, is between \(£40,000\) and \(£50,000\) or \(\gt30,000\).
Those are defined with formulas and involve calculus, but we will use R instead (more in the workshop).

Example: Continuous Random Variable

Probability Distributions

Probability Distribution

Probability distribution assigns probabilities to values taken by random variables.
Discrete distributions assign probabilities to individual values.
Continuous distributions assign probability to intervals.
Probability distributions are defined using mathematical formulas.
But graphical representations provide a good intuition.

Discrete Distribution

Continuous Distribution

Normal Probability Distribution

The most important probability distribution.
As it:
- Approximates the distribution of many variables in the real world.
- Is used a lot in inferential statistics.
It is symmetric, bell-shaped and fully described by its mean \(\mu\) and variance \(\sigma^2\).
Can also be called normal distribution for short.
We can denote it as: \[Y \sim N(\mu, \sigma^2)\] “\(Y\) is distributed according to a normal distribution with mean \(\mu\) and variance \(\sigma^2\).”

Normal Distribution

Example: Climate Change

IPCC - Intergovernmental Panel on Climate Change

Standard Normal Distribution

To calculate probabilities for a normal variable with a general mean and variance, we must standardise the variable by first subtracting the mean, then by dividing the result by the standard deviation: \[z = \frac{x - \mu_x}{\sigma_x}\]
Z-scores indicate the number of standard deviation units a value is from the mean of a distribution.
Standard normal distribution is the normal distribution with mean \(\mu = 0\) and variance \(\sigma^2 = 1\) and can be denoted as \(N(0, 1)\).

Standard Normal Distribution

\(\chi^2\) Distribution

\(\chi^2\) (pronounced chi-squared) distribution
Shape depends on the degrees of freedom (more later)
Used to analyse contingency tables.

\(t\) Distribution

\(t\) distribution is bell shaped and and symmetric around the mean of \(0\).
In comparison to the standard normal distribution its standard error is a bit larger than \(1\) and depends on the degrees of freedom.
Used to compare means of variables between different groups (more later).

Workshop:
- Probability Distributions
Next week:
- Hypothesis Testing

Week 3: Probability Theory

So Far

Topics for Today

Today’s Plan

Probability

Why Probability?

Origins of Probability Theory

Example: Sortition

Example Continued: Sortition

What is Probability?

The Basics of Probability

Approaches to Probability

Frequentists vs Bayesians

Probability Axioms

Some Properties of Probability

Non-Naive Definition of Probability

Random Variables

Random Variables

Random Variables Continued

Discrete and Continuous Random Variables

Example: Discrete Random Variable

Discrete Random Variables

Example: Continuous Random Variable

Continuous Random Variables

Example: Continuous Random Variable

Probability Distributions

Probability Distribution

Discrete Distribution

Discrete Distribution

Continuous Distribution

Continuous Distribution

Normal Probability Distribution

Normal Distribution

Normal Distribution

Normal Distribution

Normal Distribution

Normal Distribution

Normal Distribution

Example: Climate Change

Standard Normal Distribution

Standard Normal Distribution

\(\chi^2\) Distribution

\(t\) Distribution

Next