Week 2: Python Fundamentals¶

Python for Social Data Science¶

Tom Paskhalis¶

The Zen of Python¶

In [1]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Python basics¶

  • Python is an intepreted language (like R and Stata)
  • Every program is executed one command (aka statement) at a time
  • Which also means that work can be done interactively
In [2]:
print("Hello World!")
Hello World!

Python conceptual hierarchy¶

Python programs can be decomposed into modules, statements, expressions, and objects, as follows:

  1. Programs are composed of modules
  2. Modules contain statements
  3. Statements contain expressions
  4. Expressions create and process objects

Python objects¶

  • Everything that Python operates on is an object
  • This includes numbers, strings, data structures, functions, etc.
  • Eact object has a type (e.g. string or function) and internal data
  • Objects can be mutable (e.g. list) and immutable (e.g. string)

Operators¶

Objects and operators are combined to form expressions. Key operators are:

  • Arithmetic (+, -, *, **, /, //, %)
  • Boolean (and, or, not)
  • Relational (==, !=, >, >=, <, <=)
  • Assignment (=, +=, -=, *=, /=)
  • Membership (in)

Basic mathematical operations in Python¶

In [3]:
1 + 1
Out[3]:
2
In [4]:
5 - 3
Out[4]:
2
In [5]:
6 / 2
Out[5]:
3.0
In [6]:
4 * 4
Out[6]:
16
In [7]:
# Exponentiation <- Python comments start with #
2 ** 4
Out[7]:
16

Basic logical operations in Python¶

In [8]:
3 != 1 # Not equal
Out[8]:
True
In [9]:
3 > 3 # Greater than
Out[9]:
False
In [10]:
3 >= 3 # Greater than or equal
Out[10]:
True
In [11]:
False or True # True if either first or second operand is True, False otherwise
Out[11]:
True
In [12]:
3 > 3 or 3 >= 3 # Combining 3 Boolean expressions
Out[12]:
True

Assignment operations¶

  • Assignments create object references.
  • Target (or name) on the left is assigned to object on the right.
In [13]:
x = 3
In [14]:
x
Out[14]:
3
In [15]:
x += 2 # Increment assignment, equivalent to x = x + 2
In [16]:
x
Out[16]:
5

Assignment vs Comparison Operators¶

As = (assignment) and == (equality comparison) operators appear very similar, they sometime can create confusion.

In [17]:
x = 3
In [18]:
x
Out[18]:
3
In [19]:
x == 3
Out[19]:
True

Membership operations¶

Operator in returns True if an object of the left side is in a sequence on the right.

In [20]:
'a' in 'abc'
Out[20]:
True
In [21]:
4 in [1, 2, 3] # [1,2,3] is a list
Out[21]:
False
In [22]:
4 not in [1, 2, 3]
Out[22]:
True

Object types¶

Python objects can have scalar and non-scalar types. Scalar objects are indivisible.

4 main types of scalar objects in Python:

  • Integer (int)
  • Real number (float)
  • Boolean (bool)
  • Null value (None)

Scalar types¶

In [23]:
type(7)
Out[23]:
int
In [24]:
type(3.14)
Out[24]:
float
In [25]:
type(True)
Out[25]:
bool
In [26]:
type(None)
Out[26]:
NoneType
In [27]:
int(3.14) # Scalar type conversion (casting)
Out[27]:
3

Non-scalar types¶

In contrast to scalars, non-scalar objects, sequences, have some internal structure. This allows indexing, slicing and other interesting operations.

Most common sequences in Python are:

  • String (str) - immutable ordered sequence of characters
  • Tuple (tuple) - immutable ordered sequence of elements
  • List (list) - mutable ordered sequence of elements
  • Set (set) - mutable unordered collection of unique elements
  • Dictionary (dict) - mutable unordered collection of key-value pairs

Examples of non-scalar types¶

In [28]:
s = 'time flies like a banana'
t = (0, 'one', 1, 2)
l = [0, 'one', 1, 2]
o = {'apple', 'banana', 'watermelon'}
d = {'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0}
In [29]:
type(s)
Out[29]:
str
In [30]:
type(t)
Out[30]:
tuple
In [31]:
type(l)
Out[31]:
list
In [32]:
type(o)
Out[32]:
set
In [33]:
type(d)
Out[33]:
dict

Indexing and subsetting in Python¶

  • Indexing can be used to subset individual elements from a sequence
  • Slicing can be used to extract sub-sequence of arbitrary length
  • Use square brackets [] to supply the index (indices) of elements:
object[index]

Indexing in Python starts from 0¶

Source: xkcd
Extra: Why Python uses 0-based indexing by Guido van Rossum
Extra: Why numbering should start at zero by Edsger Dijkstra

Strings¶

In [34]:
s
Out[34]:
'time flies like a banana'
In [35]:
len(s) # length of string (including whitespaces)
Out[35]:
24
In [36]:
s[0] # Subset 1st element (indexing in Python starts from zero!)
Out[36]:
't'
In [37]:
s[5:] # Subset all elements starting from 6th
Out[37]:
'flies like a banana'
In [38]:
s + '!' # Strings can be concatenated together
Out[38]:
'time flies like a banana!'

Objects have methods¶

  • Python objects of built-in types have methods associated with them
  • They can be thought of function-like objects
  • However, their syntax is object.method() as opposed to function(object)
In [39]:
len(s) # Function
Out[39]:
24
In [40]:
s.upper() # Method (makes string upper-case)
Out[40]:
'TIME FLIES LIKE A BANANA'

String methods¶

Some examples of methods associated with strings. More details here.

In [41]:
s.capitalize() # Note that only the first character gets capitalized
Out[41]:
'Time flies like a banana'
In [42]:
s.split(sep = ' ') # Here we supply an argument 'sep' to our methods call
Out[42]:
['time', 'flies', 'like', 'a', 'banana']
In [43]:
s.replace(' ', '-') # Arguments can also be matched by position, not just name
Out[43]:
'time-flies-like-a-banana'
In [44]:
'-'.join(s.split(sep = ' ')) # Methods calls can be nested within each other
Out[44]:
'time-flies-like-a-banana'

Tuples¶

In [45]:
t # Tuples can contain elements of different types
Out[45]:
(0, 'one', 1, 2)
In [46]:
len(t)
Out[46]:
4
In [47]:
t[1:]
Out[47]:
('one', 1, 2)
In [48]:
t + ('three', 5) # Like strings tuples can be concatenated
Out[48]:
(0, 'one', 1, 2, 'three', 5)

Lists¶

In [49]:
l # Like tuples lists can contain elements of different types
Out[49]:
[0, 'one', 1, 2]
In [50]:
l[1] = 1 # Unlike tuples lists are mutable
In [51]:
l
Out[51]:
[0, 1, 1, 2]
In [52]:
t[1] = 1 # Compare to tuple
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-52-4e4114da061e> in <module>
----> 1 t[1] = 1 # Compare to tuple

TypeError: 'tuple' object does not support item assignment

Indexing and slicing lists¶

In [53]:
l
Out[53]:
[0, 1, 1, 2]
In [54]:
l[1:] # Subset all elements starting from 2nd
Out[54]:
[1, 1, 2]
In [55]:
l[-1] # Subset the last element
Out[55]:
2
In [56]:
l[::2] # Subset every second element, list[start:stop:step]
Out[56]:
[0, 1]
In [57]:
l[::-1] # Subset all elements in reverse order
Out[57]:
[2, 1, 1, 0]

Sets¶

In [58]:
o
Out[58]:
{'apple', 'banana', 'watermelon'}
In [59]:
{'apple', 'apple', 'banana', 'watermelon'} # Sets retain only unique values
Out[59]:
{'apple', 'banana', 'watermelon'}
In [60]:
{'apple'} < o # Sets can be compared (e.g. one being subset of another)
Out[60]:
True
In [61]:
o[1] # Unlike strings, tuples and lists, sets are unordered
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-61-6a3d97725b65> in <module>
----> 1 o[1] # Unlike strings, tuples and lists, sets are unordered

TypeError: 'set' object is not subscriptable

Set methods in Python¶

Set methods example¶

Source: Wikipedia

Set methods example continued¶

In [62]:
nordic = {'Denmark', 'Iceland', 'Finland', 'Norway', 'Sweden'}
eu = {'Denmark', 'Finland', 'Sweden'}
krones = {'Denmark', 'Sweden'}
In [63]:
euro = eu.difference(krones) # Same can expressed using infix operators `eu - krones`
euro
Out[63]:
{'Finland'}
In [64]:
efta = nordic.difference(eu).union({'Liechtenstein', 'Switzerland'}) # Methods calls can also be 'chained'
efta
Out[64]:
{'Iceland', 'Liechtenstein', 'Norway', 'Switzerland'}
In [65]:
efta.intersection(nordic) # efta & nordic
Out[65]:
{'Iceland', 'Norway'}
In [66]:
schengen = efta.union(eu) # efta | eu
schengen
Out[66]:
{'Denmark',
 'Finland',
 'Iceland',
 'Liechtenstein',
 'Norway',
 'Sweden',
 'Switzerland'}

Dictionaries¶

In [67]:
d
Out[67]:
{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0}
In [68]:
d['apple'] # Unlike strings, tuples and lists, dictionaries are indexed by 'keys'
Out[68]:
150.0
In [69]:
d[0] # Rather than integers
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-69-3cd4cfa8b308> in <module>
----> 1 d[0] # Rather than integers

KeyError: 0
In [70]:
d['strawberry'] = 12.0 # They are, however, mutable like lists and sets
d
Out[70]:
{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0, 'strawberry': 12.0}

Conversion between non-scalar types¶

In [71]:
t ## Tuple
Out[71]:
(0, 'one', 1, 2)
In [72]:
list(t) ## Convert to list with a `list` function
Out[72]:
[0, 'one', 1, 2]
In [73]:
[x for x in t] ## List comprehesion, [expr for elem in iterable if test]
Out[73]:
[0, 'one', 1, 2]
In [74]:
set([0, 1, 1, 2]) ## Conversion to set retains only unique values
Out[74]:
{0, 1, 2}

None value¶

  • None is a Python null object
  • It is often used to initialize objects
  • And it is a return value in some functions (more on that later)
In [75]:
# Initialization of some temporary variable, which can re-assigned to another value later
tmp = None
In [76]:
# Here we are initializing a list of length 10
tmp_l = [None] * 10
tmp_l
Out[76]:
[None, None, None, None, None, None, None, None, None, None]
In [77]:
None == None
Out[77]:
True

Aliasing vs copying in Python¶

  • Assignment binds the varible name on the left of = sign to the object of certain type on the right.
  • But the same object can have different names.
  • Operations on immutable types typically overwrite the object if it gets modified.
  • But for mutable objects (lists, sets, dictionaries) this can create hard-to-track problems.

Example of aliasing/copying for immutable types¶

In [78]:
x = 'test' # Object of type string is assinged to variable 'x'
x
Out[78]:
'test'
In [79]:
y = x # y is created an alias (alternative name) of x
y
Out[79]:
'test'
In [80]:
x = 'rest' # Another object of type string is assigned to 'x'
x
Out[80]:
'rest'
In [81]:
y
Out[81]:
'test'

Example of aliasing/copying for mutable types¶

In [82]:
d
Out[82]:
{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0, 'strawberry': 12.0}
In [83]:
d1 = d # Just an alias
d2 = d.copy() # Create a copy
d['watermelon'] = 500 # Modify original dictionary
In [84]:
d1
Out[84]:
{'apple': 150.0, 'banana': 120.0, 'watermelon': 500, 'strawberry': 12.0}
In [85]:
d2
Out[85]:
{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0, 'strawberry': 12.0}

Summary of built-in object types in Python¶

Type Description Scalar Mutability Order
int integer scalar immutable
float real number scalar immutable
bool Boolean scalar immutable
None Python 'Null' scalar immutable
str string non-scalar immutable ordered
tuple tuple non-scalar immutable ordered
list list non-scalar mutable ordered
set set non-scalar mutable unordered
dict dictionary non-scalar mutable unordered

Extensive documentation on built-it types

Modules¶

  • Python's power lies in its extensibility
  • This is usually achieved by loading additional modules (libraries)
  • Module can be just a .py file that you import into your program (script)
  • However, often this refers to external libraries installed using pip or conda
  • Standard Python installation also includes a number of modules (full list here)

Basic statistical operations¶

In [86]:
import statistics # Standard Python module
fib = [0, 1, 1, 2, 3, 5]
In [87]:
statistics.mean(fib) # Mean
Out[87]:
2
In [88]:
statistics.median(fib) # Median
Out[88]:
1.5
In [89]:
statistics.mode(fib) # Mode
Out[89]:
1
In [90]:
statistics.stdev(fib) # Standard deviation
Out[90]:
1.7888543819998317

Help!¶

Python has an inbuilt help facility which provides more information about any object:

In [91]:
?s
In [92]:
help(s.join)
Help on built-in function join:

join(iterable, /) method of builtins.str instance
    Concatenate any number of strings.
    
    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.
    
    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'

  • The quality of the documentation varies hugely across libraries
  • Stackoverflow is a good resource for many standard tasks
  • For custom packages it is often helpful to check the issues page on the GitHub
  • E.g. for pandas: https://github.com/pandas-dev/pandas/issues
  • Or, indeed, any search engine #LMDDGTFY

Next Week¶

  • Control Flow and Functions