Week 1: Introduction to Python¶

Python for Social Data Science¶

Tom Paskhalis¶

Overview¶

  • Backstory
  • Jupyter Notebook
  • Markdown
  • Operators

Source: xkcd

R/Stata/SPSS is great, why learn Python?¶

  • Python is free and open source
  • Python is a truly versatile programming language
  • Python offers a great library ecosystem (>300K)
  • Python is widely used in the industry
  • Python is well-known outside academia/data science

Popularity of programming languages¶

Source: TIOBE

Popularity of data analysis software¶

Source: Kaggle 2021 State of Data Science and Machine Learning survey

Python and Development Enviroments¶

  • There is a number of integrated development environments (IDEs) available for Python (IDLE, Spyder, PyCharm)
  • As well code editors with Python-specific extensions (Visual Studio Code, Atom, Sublime Text, Vim)
  • Try different ones and choose what works best for you!

Python and Jupyter Notebook¶

  • Jupyter Notebook is language-agnostic web-based interactive computational environment
  • Is available with backends (kernels) for different programming languages (Julia, Python, R = Jupyter)
  • Can be used both locally and remotely
  • Good for ad-hoc data analysis and visualization

Jupyter Notebook Installation¶

  • There are two main ways to install Jupyter Notebook locally: pip and conda. Unless you have prior experience with Python, I recommend installing Anaconda distribution, which contains all the packages required for this course.
  • Alternatively, you may choose to use Kaggle Code or Google Colab, online platforms for hosting Jupyter Notebooks. Their interfaces are slightly different and you need to register on Kaggle or have a Google account, but it does not require any local installations.
  • However, for this module and course more broadly I recommend installing toolchaing for working with Jupyter Notebooks locally.

Starting Jupyter¶

  • To start Jupyter, open CLI/Terminal and type jupyter notebook
  • This will open a browser window with Jupyter Notebook displaying the directory, in which you executed the command above.
  • To create a new notebook press New and select Python from the drop-down menu

Using Jupyter¶

  • In order to run a Python command, create a new cell:
    • Press ➕ in the toolbar or click Insert, Insert Cell Below
    • Make sure that in the drop-down menu on the toolbar you select Code
    • Press CTRL+ENTER to run a command
  • Rather than running a Python command, you can also write Markdown in the cell (e.g. to create slides)
    • Select Markdown in the drop-down manu on the toolbar
    • Write Markdown (check Markdown Cheatsheet)
    • Press CTRL+ENTER to render Markdown cell

Jupyter Notebook Demonstration¶

Jupyter Notebook 1

Jupyter Notebook Demonstration¶

Jupyter Notebook 2

Stopping Jupyter Notebook¶

  • First, make sure you saved your work (!) by pressing Command+S / CTRL+S
  • You can close the running notebook by clicking File and then Close and Halt
  • Jupyter Notebook runs as a server
  • Which means that closing its tabs/web browser does not stop it
  • You need to press Quit in the upper right corner of your main Jupyter tab (located at http://localhost:8888/)
  • Alternatively, you can press CTRL+C in the terminal window

Python background¶

Source: Guido van Rossum, Python Software Foundation

  • Started as a side-project in 1989 by Guido van Rossum, BDFL (benevolent dictator for life) until 2018.
  • Python 3, first released in 2008, is the current major version
  • Python 2 support stopped on 1 January 2020

The Zen of Python¶

In [1]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Python basics¶

  • Python is an interpreted language (like R and Stata)
  • Every program is executed one command (aka statement) at a time
  • Which also means that work can be done interactively
In [1]:
print("Hello World!")
Hello World!

Python conceptual hierarchy¶

Python programs can be decomposed into modules, statements, expressions, and objects, as follows:

  1. Programs are composed of modules.
  2. Modules contain statements.
  3. Statements contain expressions.
  4. Expressions create and process objects.

Python objects¶

  • Everything that Python operates on is an object.
  • This includes numbers, strings, data structures, functions, etc.
  • Eact object has a type (e.g. string or function) and internal data.
  • Objects can be mutable (e.g. list) and immutable (e.g. string).

Operators¶

Objects and operators are combined to form expressions. Key operators are:

  • Arithmetic (+, -, *, **, /, //, %)
  • Boolean (and, or, not)
  • Relational (==, !=, >, >=, <, <=)
  • Assignment (=, +=, -=, *=, /=)
  • Membership (in)

Basic mathematical operations in Python¶

In [2]:
1 + 1
Out[2]:
2
In [3]:
5 - 3
Out[3]:
2
In [4]:
6 / 2
Out[4]:
3.0
In [5]:
4 * 4
Out[5]:
16
In [6]:
# Exponentiation <- Python comments start with #
2 ** 4
Out[6]:
16

Advanced mathematical operations in Python¶

In [8]:
# Integer division (remainder is discarded)
7 // 3
Out[8]:
2
In [9]:
# Modulo operation (only remainder is retained)
7 % 3
Out[9]:
1

Basic logical operations in Python¶

In [7]:
3 != 1 # Not equal
Out[7]:
True
In [8]:
3 > 3 # Greater than
Out[8]:
False
In [9]:
3 >= 3 # Greater than or equal
Out[9]:
True
In [10]:
False or True # True if either first or second operand is True, False otherwise
Out[10]:
True
In [11]:
3 > 3 or 3 >= 3 # Combining 3 Boolean expressions
Out[11]:
True

Assignment operations¶

Assignments create object references. Target (or name) on the left is assigned to object on the rigth.

In [12]:
x = 3
In [13]:
x
Out[13]:
3
In [14]:
x += 2 # Increment assignment, equivalent to x = x + 2
In [15]:
x
Out[15]:
5

Assignment vs Comparison Operators¶

As = (assignment) and == (equality comparison) operators appear very similar, they sometime can create confusion.

In [16]:
x = 3
In [17]:
x
Out[17]:
3
In [18]:
x == 3
Out[18]:
True

Membership operations¶

Operator in returns True if an object of the left side is in a sequence on the right.

In [19]:
'a' in 'abc'
Out[19]:
True
In [20]:
4 in [1, 2, 3] # [1,2,3] is a list
Out[20]:
False
In [21]:
4 not in [1, 2, 3]
Out[21]:
True

Markdown formatting basics¶

  • Use _ or * for emphasis (single - italic, double - bold, triple - bold and italic)
    • *one* becomes one, __two__ - two and ***three*** - *three*
  • Headers or decreasing levels follow #, ##, ###, #### and so on
  • (Unordered) Lists follow marker -, + or *
    • Start at the left-most position for top-level
    • Indent four space and use another marker for nesting like here
  • (Numbered) Lists use 1. (counter is auto-incremented)
  • Links have syntax of [some text here](url_here)
  • Images similarly: ![alt text](url or path to image)

Markdown example¶

Some text in *italic* and **bold**

Simple list:

- A
- B

Ordered list:

1. A
1. B

Formula - $Y = X + 5$

Markdown example¶

Some text in italic and bold

Simple list:

  • A
  • B

Ordered list:

  1. A
  2. B

Formula - $Y = X + 5$

Next Week¶

  • Python fundamentals