# Week 8: Fundamentals of Python Programming I

POP77001 Computer Programming for Social Scientists

Tom Paskhalis

## Overview

-   Python programs and their components
-   Objects and operators
-   Scalar and non-scalar types
-   Indexing
-   Methods and functions

# Introduction to Python

## 

<figure>
<img src="https://imgs.xkcd.com/comics/python.png" alt="xkcd" />
<figcaption aria-hidden="true"><a
href="https://xkcd.com/353/">xkcd</a></figcaption>
</figure>

## Python Background

![](https://gvanrossum.github.io/images/DO6GvRlo.gif)

![](https://s3.dualstack.us-east-2.amazonaws.com/pythondotorg-assets/media/community/logos/python-logo-only.png)

-   Started as a side-project in 1989 by Guido van Rossum, BDFL
    (benevolent dictator for life) until 2018.
-   Python 3, first released in 2008, is the current major version.
-   Python 2 support stopped on 1 January 2020.

## The Zen of Python

In [None]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

## Python Basics

-   Python is an **interpreted** language (like R and Stata).
-   Every program is executed one *command* (aka *statement*) at a time.
-   Which also means that work can be done interactively.

In [None]:
'POP' + '77001'

'POP77001'

## Python Conceptual Hierarchy

-   Python programs can be decomposed into modules, statements,
    expressions, and objects, as follows:
    1.  *Programs* are composed of *modules*
    2.  *Modules* contain *statements*
    3.  *Statements* contain *expressions*
    4.  *Expressions* create and process *objects*

## Python Objects

-   Everything that Python operates on is an object.
-   This includes numbers, strings, data structures, functions, etc.
-   Each object has a **type** (e.g. string or function) and internal
    data
-   Objects can be **mutable** (e.g. list) and **immutable**
    (e.g. string)

# Operations

## Operators

-   Objects and operators are combined to form **expressions**.
-   Key **operators** are:
    -   Assignment (`=`, `+=`, `-=`, `*=`, `/=`)
    -   Arithmetic (`+`, `-`, `*`, `**`, `/`, `//`, `%`)
    -   Boolean (`and`, `or`, `not`)
    -   Relational (`==`, `!=`, `>`, `>=`, `<`, `<=`)
    -   Membership (`in`)

## Mathematical Operations

Arithmetic operations:

. . .

In [None]:
1 + 1

2

. . .

In [None]:
5 - 3

2

. . .

In [None]:
6 / 2

3.0

. . .

In [None]:
4 * 4

16

. . .

In [None]:
# As in R, Python comments start with #
# Exponentiation
2 ** 4

16

. . .

Advanced mathematical operations:

. . .

In [None]:
# Integer division (remainder is discarded)
7 // 3

2

. . .

In [None]:
# Modulo operation (only remainder is retained)
7 % 3

1

## Logical Operations

In [None]:
3 != 1 # Not equal

True

. . .

In [None]:
3 > 3 # Greater than

False

. . .

In [None]:
3 >= 3 # Greater than or equal

True

. . .

In [None]:
# True if either first or second operand is True, False otherwise
# Analogous to R's | operator
False or True 

True

. . .

In [None]:
# True if both first and second operand are True, False otherwise
# Analogous to R's & operator
False and True 

False

. . .

In [None]:
3 > 3 or 3 >= 3 # Combining 3 Boolean expressions

True

## Membership Operations

Operator `in` returns `True` if an object of the left side is in a
sequence on the right.

In [None]:
# Strings are also sequences in Python
'a' in 'abc'

True

. . .

In [None]:
3 in [1, 2, 3] # [1,2,3] is a list

True

. . .

In [None]:
3 not in [1, 2, 3]

False

## Operator Precedence

| Operator | Description |
|:-----------------------------|:-----------------------------------------|
| (expressions…), | Binding or parenthesized expression, |
| \[expressions…\], | list display, |
| {key: value…}, | dictionary display, |
| {expressions…} | Set display |
| x\[index\], | subscription, |
| x\[index:index\], | slicing, |
| x(arguments…), | call, |
| x.attribute | Attribute reference |
| await x | Await expression |
| \*\* | Exponentiation |
| +x, -x, ~x | Positive, negative, bitwise NOT |
| \*, @, /, //, % | Multiplication, matrix multiplication, division, floor division, remainder |
| +, - | Addition and subtraction |
| \<\<, \>\> | Shifts |
| & | Bitwise AND |
| ^ | Bitwise XOR |
| \| | Bitwise OR |
| `in`, `not in`, `is`, `is not`, \<, \<=, \>, \>=, !=, == | Comparisons, including membership tests and identity tests |
| not x | Boolean NOT |
| and | Boolean AND |
| or | Boolean OR |
| if – else | Conditional expression |
| lambda | Lambda expression |
| := | Assignment expression |

> **Extra**
>
> [Python Documentation on Operator
> Precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence)

# Assignment

## Assignment Operations

-   Assignments create object references.
-   **Target** (or **name**) on the left is assigned to **object** on
    the right.

In [None]:
x = 3

. . .

In [None]:
x

3

. . .

![](attachment:08_lecture_python_i_files/figure-ipynb/mermaid-figure-1.png)

. . .

In [None]:
x += 2 # Increment assignment, equivalent to x = x + 2

. . .

In [None]:
x

5

## Assignment vs Comparison

As `=` (assignment) and `==` (equality comparison) operators appear very
similar, they sometimes can create confusion.

In [None]:
x = 3 # Assignment

. . .

In [None]:
x

3

. . .

In [None]:
x == 3 # Equality comparison

True

# Object Types

## Divisibility & Mutability

-   In Python it is useful to think of objects storing:
    -   Individual values (**scalars**) or
    -   **Sequences** of elements
-   Scalar objects are indivisible and immutable.
-   Sequences can be both **mutable** and **immutable**.
-   4 main types of scalar objects in Python:
    -   Integer (`int`)
    -   Real number (`float`)
    -   Boolean (`bool`)
    -   Null value (`None`)

## Scalar Types

-   All scalar types are indivisible and immutable

. . .

In [None]:
type(7)

<class 'int'>

. . .

In [None]:
type(3.14)

<class 'float'>

. . .

In [None]:
type(True)

<class 'bool'>

. . .

In [None]:
# None is the only object of NoneType
type(None)

<class 'NoneType'>

. . .

-   Scalar type conversion (casting) can be done using type names as
    functions:

In [None]:
int(3.14) 

3

. . .

In [None]:
str(42) 

'42'

## Non-scalar Types

-   Non-scalar objects are all types of **sequences**.
-   This allows indexing, slicing and other interesting operations.
-   Most common sequences in Python are:
    -   String (`str`) - *immutable ordered* sequence of Unicode
        characters
    -   Tuple (`tuple`) - *immutable ordered* sequence of elements
    -   List (`list`) - *mutable ordered* sequence of elements
    -   Set (`set`) - *mutable unordered* collection of unique elements
    -   Dictionary (`dict`) - *mutable unordered* collection of
        key-value pairs

## Sequences: Example

In [None]:
s = 'time flies like a banana'
t = (0, 'one', 1, 2)
l = [0, 'one', 1, 2]
o = {'apple', 'banana', 'watermelon'}
d = {'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0}

. . .

In [None]:
type(s)

<class 'str'>

. . .

In [None]:
type(t)

<class 'tuple'>

. . .

In [None]:
type(l)

<class 'list'>

. . .

In [None]:
type(o)

<class 'set'>

. . .

In [None]:
type(d)

<class 'dict'>

# Working with Objects

## Indexing and Subsetting in Python

-   **Indexing** can be used to subset individual elements from a
    sequence.
-   **Slicing** can be used to extract sub-sequence of arbitrary length.
-   Use square brackets `[]` to supply the index (indices) of elements:

<!-- -->

    object[index]

## Indexing in Python Starts from 0

<figure>
<img src="https://imgs.xkcd.com/comics/donald_knuth.png" alt="xkcd" />
<figcaption aria-hidden="true"><a
href="https://xkcd.com/163/">xkcd</a></figcaption>
</figure>

> **Extra**
>
> [Why Python uses 0-based indexing by Guido van
> Rossum](https://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html)
>
> [Why numbering should start at zero by Edsger
> Dijkstra](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html)

## Subsetting Strings

In [None]:
s

'time flies like a banana'

. . .

In [None]:
# Length of string (including whitespaces)
len(s)

24

. . .

In [None]:
# Subset 1st element (indexing in Python starts from zero!)
s[0]

't'

. . .

In [None]:
# Subset all elements starting from 6th
s[5:]

'flies like a banana'

. . .

In [None]:
# Strings can be concatenated together
s + '!'

'time flies like a banana!'

## Objects Have Methods

-   Python objects have **methods** associated with them.
-   They can be thought of function-like objects.
-   However, their syntax is `object.method()`
-   As opposed to `function(object)`.

In [None]:
len(s) # Function

24

. . .

In [None]:
s.upper() # Method (makes string upper-case)

'TIME FLIES LIKE A BANANA'

## String Methods

Some examples of methods available for strings:

In [None]:
# Note that only the first character gets capitalized
s.capitalize()

'Time flies like a banana'

. . .

In [None]:
# Here we supply an argument 'sep' to our methods call
s.split(sep = ' ')

['time', 'flies', 'like', 'a', 'banana']

. . .

In [None]:
# Arguments can also be matched by position, not just name
s.replace(' ', '-')

'time-flies-like-a-banana'

. . .

In [None]:
# Methods calls can be nested within each other
'-'.join(s.split(sep = ' '))

'time-flies-like-a-banana'

> **Extra**
>
> [Python string
> methods](https://docs.python.org/3/library/stdtypes.html#string-methods)

## Method Chaining

-   Methods can be chained together to perform multiple operations on
    the same object.
-   The output of one method becomes the input of the next method.

In [None]:
s

'time flies like a banana'

. . .

-   Instead of applying string methods one by one, we can chain them
    together:

In [None]:
s.replace(' a ', ' an ').replace('banana', 'arrow').capitalize().split(sep = ' ')

['Time', 'flies', 'like', 'an', 'arrow']

. . .

-   For ease of reading, we can break the chain into multiple lines:

In [None]:
s_as_l = (
  s
  .replace(' a ', ' an ')
  .replace('banana', 'arrow')
  .capitalize()
  .split(sep = ' ')
)
s_as_l

['Time', 'flies', 'like', 'an', 'arrow']

## Tuples

Tuples can contain elements of different types:

In [None]:
t

(0, 'one', 1, 2)

. . .

In [None]:
len(t)

4

. . .

In [None]:
t[1:]

('one', 1, 2)

. . .

Like strings tuples can be concatenated:

In [None]:
t + ('three', 5)

(0, 'one', 1, 2, 'three', 5)

## Lists

Like tuples lists can contain elements of different types:

In [None]:
l

[0, 'one', 1, 2]

. . .

Unlike tuples lists are mutable:

In [None]:
l[1] = 1

. . .

In [None]:
l

[0, 1, 1, 2]

. . .

In [None]:
# Compare to tuple
t[1] = 1

TypeError: 'tuple' object does not support item assignment

## Indexing and Slicing Lists

-   Indexing and slicing lists (and other ordered sequences) is similar
    to using `seq()` function in R.
-   The general syntax for slicing in Python is `seq[start:stop:step]`.
-   Note that the `stop` index is not included in the slice.
-   If `start` or `stop` are omitted, they default to the beginning and
    end of the sequence respectively.
-   If `step` is omitted, it defaults to 1.

## Indexing and Slicing Lists: Example

In [None]:
l

[0, 1, 1, 2]

. . .

In [None]:
# Subset all elements starting from 2nd
l[1:]

[1, 1, 2]

. . .

In [None]:
# Subset the last element
l[-1]

2

. . .

In [None]:
# Subset every second element,
# list[start:stop:step]
l[::2]

[0, 1]

. . .

In [None]:
# Subset all elements in reverse order
l[::-1] 

[2, 1, 1, 0]

## Sets

In [None]:
o

{'apple', 'watermelon', 'banana'}

. . .

In [None]:
# Sets retain only unique values
{'apple', 'apple', 'banana', 'watermelon'}

{'apple', 'watermelon', 'banana'}

. . .

In [None]:
# Sets also have methods
o.difference({'banana'})

{'apple', 'watermelon'}

. . .

In [None]:
# Some methods can be expressed as operators
o - {'banana'}

{'apple', 'watermelon'}

. . .

In [None]:
# Sets can be compared (e.g. one being subset of another)
{'apple'} < o

True

. . .

In [None]:
# Unlike strings, tuples and lists, sets are unordered
o[1]

TypeError: 'set' object is not subscriptable

## Dictionaries

In [None]:
# key:value pair, fruit_name:average_weight
d

{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0}

. . .

In [None]:
# Unlike strings, tuples and lists, dictionaries are indexed by 'keys'
d['apple']

150.0

. . .

In [None]:
# Rather than integers
d[0] 

KeyError: 0

. . .

In [None]:
# They are, however, mutable like lists and sets
d['strawberry'] = 12.0
d

{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0, 'strawberry': 12.0}

## Conversion of Non-scalar Types

In [None]:
## Tuple
t

(0, 'one', 1, 2)

. . .

In [None]:
## Convert to list with a `list` function
list(t)

[0, 'one', 1, 2]

. . .

In [None]:
## Conversion to set retains only unique values
set([0, 1, 1, 2])

{0, 1, 2}

. . .

-   **List comprehension**, a more Pythonic way of implementing loops
    and conditionals, can also be used for converting sequences.
-   It has the the general form of
    `[expr for elem in iterable if test]`.

In [None]:
[x for x in t]

[0, 'one', 1, 2]

In [None]:
[x for x in t if type(x) != str]

[0, 1, 2]

## `None` Value

-   `None` is a Python null object.
-   It is often used to initialize objects.
-   And it is a return value in some functions (more on that later).

In [None]:
# Initialization of some temporary variable, which can re-assigned to another value later
none = None
none

. . .

In [None]:
# Here we are initializing a list of length 10
none_l = [None] * 10
none_l

[None, None, None, None, None, None, None, None, None, None]

. . .

In [None]:
# Note the difference with R's NA 
None == None

True

## Aliasing vs Copying in Python

-   Assignment binds the name on the left of `=` sign to the object on
    the right.
-   But the same object can have different names (**aliases**).
-   If the expression on the right is a name, the name on the left
    becomes an alias.
-   In R (almost) all objects behave the same as the vast majority are
    immutable.
-   In Python the object’s type determines its behavior.
-   Specifically, operations on immutable types overwrite the object if
    it gets modified.
-   But for mutable types the object is modified in place.

## Copying - Immutable Types

-   Recall **copy-on-modify** semantics from R.
-   Immutable types in Python behave similarly.

. . .

In [None]:
x = 'test' # Object of type string is assigned to variable `x`
id(x) # Function `id` returns the memory address of the object

138273890090080

. . .

In [None]:
y = x # `y` is created an alias (alternative name) of `x`
id(y)

138273890090080

. . .

In [None]:
x = 'rest' # Another object of type string is assigned to `x`
x

'rest'

. . .

In [None]:
id(x)

138273898961600

. . .

In [None]:
y

'test'

. . .

In [None]:
id(y)

138273890090080

## Copying - Mutable Types

-   Mutable types in Python, however, behave differently.
-   Changing the object modifies it in place without copying.

. . .

In [None]:
d

{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0, 'strawberry': 12.0}

. . .

In [None]:
d1 = d # Just an alias
d2 = d.copy() # Create a copy

. . .

In [None]:
d1['watermelon'] = 500 # Modify original dictionary

. . .

In [None]:
d1

{'apple': 150.0, 'banana': 120.0, 'watermelon': 500, 'strawberry': 12.0}

. . .

In [None]:
d

{'apple': 150.0, 'banana': 120.0, 'watermelon': 500, 'strawberry': 12.0}

. . .

In [None]:
d2

{'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0, 'strawberry': 12.0}

## Summary of Built-in Object Types

> **Extra**
>
> [Python documentation on built-it
> types](https://docs.python.org/3/library/stdtypes.html)

## Modules

-   Python’s power lies in its extensibility.
-   This is usually achieved by loading additional modules (libraries).
-   Module can be just a `.py` file that you import into your program
    (script).
-   However, often this refers to external libraries installed using
    `pip` or `conda`.
-   Standard Python installation also includes a number of modules (full
    list [here](https://docs.python.org/3/library/index.html)).

## Basic Statistical Operations

-   Unlike R, Python does not have built-in support for statistical
    operations.

In [None]:
import statistics # Part of standard Python module
lst = [0, 1, 1, 2, 3, 5]

. . .

In [None]:
statistics.mean(lst) # Mean

2

. . .

In [None]:
statistics.median(lst) # Median

1.5

. . .

In [None]:
statistics.mode(lst) # Mode

1

. . .

In [None]:
statistics.stdev(lst) # Standard deviation

1.7888543819998317

## Help!

Python has an inbuilt help facility which provides more information
about any object:

In [None]:
?s

invalid syntax (<string>, line 1)

In [None]:
help(s.join)

Help on built-in function join:

join(iterable, /) method of builtins.str instance
    Concatenate any number of strings.

    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.

    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'

. . .

-   The quality of the documentation varies hugely across libraries
-   [Stackoverflow](https://stackoverflow.com/) is a good resource for
    many standard tasks
-   For custom packages it is often helpful to check the **issues** page
    on the [GitHub](https://github.com/)
-   E.g. for `pandas`: <https://github.com/pandas-dev/pandas/issues>
-   Or, indeed, any search engine [#LMDDGTFY](https://lmddgtfy.net/)

## Next

-   Tutorial: Python objects, types and basic operations
-   Next week: Control flow and functions in Python