Week 4: Debugging and Testing¶

Python for Social Data Science¶

Tom Paskhalis¶

Overview¶

  • Software bugs
  • Debugging
  • Exception handling
  • Testing
  • Defensive programming

Bugs¶

Source: Giphy

Computer bugs before¶

Grace Murray Hopper popularised the term bug after in 1947 her team traced an error in the Mark II to a moth trapped in a relay.

Source: US Naval History and Heritage Command

Computer bugs today¶

In [1]:
def even_or_odd(num):
    if num % 2 == 0:
        return 'even'
    else:
        return 'odd'
In [2]:
even_or_odd(42.7)
Out[2]:
'odd'
In [3]:
even_or_odd('42')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [3], line 1
----> 1 even_or_odd('42')

Cell In [1], line 2, in even_or_odd(num)
      1 def even_or_odd(num):
----> 2     if num % 2 == 0:
      3         return 'even'
      4     else:

TypeError: not all arguments converted during string formatting

Explicit expectations¶

In [4]:
def even_or_odd(num):
    num = int(num) # We expect input to be integer or convertible into one
    if num % 2 == 0:
        return 'even'
    else:
        return 'odd'
In [5]:
even_or_odd(42.7)
Out[5]:
'even'
In [6]:
even_or_odd('42')
Out[6]:
'even'

Types of bugs¶

  • Overt vs covert
    • Overt bugs have obvious manifestation (e.g. premature program termination, crash)
    • Covert bugs manifest themselves in wrong (unexpected) results
  • Persistent vs intermittent
    • Persistent bugs occur for every run of the program with the same input
    • Intermittent bugs occur occasionally even given the same input and other conditions

Debugging¶

Source: Giphy

Debugging¶

Fixing a buggy program is a process of confirming, one by one, that the many things you believe to be true about the code actually are true. When you find that one of your assumptions is not true, you have found a clue to the location (if not the exact nature) of a bug.

Norman Matloff

When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.

Arthur Conan Doyle

  • Process of finding, isolating and fixing an existing problem in computer program

Debugging process¶

  1. Realise that you have a bug
    • Could be non-trivial for covert and intermittent bugs
  2. Make it reproducible
    • Extremely important step that makes debugging easier
    • Isolate the smallest snippet of code that repeats the bug
    • Test with different inputs/objects
    • Will also be helpful if you are seeking outside help
    • Provides a case that can be used in automated testing
  3. Figure out where it is
    • Formulate hypotheses, design experiments
    • Test hypotheses on a reproducible example
    • Keep track of the solutions that you have attempted
  4. If it worked:
    • Fix the bug and test the use-case
  5. Otherwise:
    • Sleep on it

Debugging process continued¶

Source: Julia Evans

Debugging with print()¶

  • print() statement can be used to check the internal state of a program during evaluation
  • Can be placed in critical parts of code (before or after loops/function calls/objects loading)
  • Can be combined with functions vars() or locals() to reveal all local objects
  • For harder cases switch to Python debugger (pdb)

Extra: Tips for debugging with print()

Bug example¶

In [7]:
def calculate_median(lst):
    lst.sort()
    n = len(lst)
    m = (n + 1)//2
    if n % 2 == 1:
        median = lst[m]
    else:
        median = sum(lst[m-1:m])/2
    return median
In [8]:
l1 = [1, 2, 3]
l2 = [0, 1, 1, 2]
In [9]:
calculate_median(l1)
Out[9]:
3
In [10]:
calculate_median(l2)
Out[10]:
0.5

Debugging with print() example¶

In [11]:
def calculate_median(lst):
    lst.sort()
    n = len(lst)
    m = (n + 1)//2
    print(m)
    if n % 2 == 1:
        median = lst[m]
    else:
        median = sum(lst[m-1:m])/2
    return median
In [12]:
l1 = [1, 2, 3]
l2 = [0, 1, 1, 2]
In [13]:
calculate_median(l1)
2
Out[13]:
3
In [14]:
calculate_median(l2)
2
Out[14]:
0.5

Debugging with print() and vars() example¶

In [15]:
def calculate_median(lst):
    lst.sort()
    n = len(lst)
    m = (n + 1)//2
    print(vars())
    if n % 2 == 1:
        median = lst[m]
    else:
        median = sum(lst[m-1:m])/2
    return median
In [16]:
l1 = [1, 2, 3]
l2 = [0, 1, 1, 2]
In [17]:
calculate_median(l1)
{'lst': [1, 2, 3], 'n': 3, 'm': 2}
Out[17]:
3
In [18]:
calculate_median(l2)
{'lst': [0, 1, 1, 2], 'n': 4, 'm': 2}
Out[18]:
0.5

Debugging with print() example continued¶

In [19]:
def calculate_median(lst):
    lst.sort()
    n = len(lst)
    m = (n + 1)//2
    print(m)
    if n % 2 == 1:
        median = lst[m-1]
    else:
        print(sum(lst[m-1:m]))
        median = sum(lst[m-1:m])/2
    return median
In [20]:
l1 = [1, 2, 3]
l2 = [0, 1, 1, 2]
In [21]:
calculate_median(l1)
2
Out[21]:
2
In [22]:
calculate_median(l2)
2
1
Out[22]:
0.5

Exceptions¶

  • Exceptions are events that can modify the control flow of a program
  • In Python exceptions are automatically triggered (raised) on errors
  • They can be caught and handled by your code
  • You can also incorporate exception triggers into your code

Extra: Python documentation on errors and exceptions

Exception examples¶

In [23]:
77001 23 + # Raises an exception 'SyntaxError'
  Cell In [23], line 1
    77001 23 + # Raises an exception 'SyntaxError'
          ^
SyntaxError: invalid syntax
In [24]:
'4' < 3 # Raises an exception 'TypeError'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [24], line 1
----> 1 '4' < 3

TypeError: '<' not supported between instances of 'str' and 'int'
In [25]:
l = [0, 1, 2, 3]
l[4] # Raises an exception 'IndexError'
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [25], line 2
      1 l = [0, 1, 2, 3]
----> 2 l[4]

IndexError: list index out of range

Exception handling¶

  • Exceptions that are raised by Python intepreter and are not handled by the program are terminal
  • Python provides try-except construct for catching and handling exceptions
try:
    <code_block>
except:
    <exception_code_block>

Exception handling example¶

In [26]:
def even_or_odd(num):
    try:
        num = int(num)
    except:
        print('Input cannot be converted into integer')
        return None
    if num % 2 == 0:
        return 'even'
    else:
        return 'odd'
In [27]:
even_or_odd('forty-two')
Input cannot be converted into integer
In [28]:
even_or_odd([0, 1, 2])
Input cannot be converted into integer

Handling specific exceptions¶

  • Instead of using blanket approach to exceptions, it is possible to program different courses of action depending on exception type
try:
    <code_block>
except <exception_name_1>:
    <exception_code_block_2>
except <exception_name_2>:
    <exception_code_block_2>
...
except <exception_name_n> as <variable>:
    <exception_code_block_n>

Handling specific exceptions example¶

In [29]:
def even_or_odd(num):
    try:
        num = int(num)
    except ValueError:
        print('Input cannot be converted into integer')
        return None
    if num % 2 == 0:
        return 'even'
    else:
        return 'odd'
In [30]:
even_or_odd('forty-two')
Input cannot be converted into integer
In [31]:
even_or_odd([0, 1, 2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [31], line 1
----> 1 even_or_odd([0, 1, 2])

Cell In [29], line 3, in even_or_odd(num)
      1 def even_or_odd(num):
      2     try:
----> 3         num = int(num)
      4     except ValueError:
      5         print('Input cannot be converted into integer')

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list'

Handling specific exceptions example continued¶

In [32]:
def even_or_odd(num):
    try:
        num = int(num)
    except ValueError:
        print('Input cannot be converted into integer')
        return None
    except TypeError as msg: # Exception can also be assigned to a variable
        print(msg)
        return None
    if num % 2 == 0:
        return 'even'
    else:
        return 'odd'
In [33]:
even_or_odd('forty-two')
Input cannot be converted into integer
In [34]:
even_or_odd([0, 1, 2])
int() argument must be a string, a bytes-like object or a real number, not 'list'

Extended exception handling¶

  • else after try-except construct allows to execute arbitrary code if no exception had been raised
  • finally allows to execute some code block regardless of the result of try block
try:
    <code_block>
except <exception_name_1>:
    <exception_code_block_2>
except (<exception_name_2>, <exception_name_3>):
    <exception_code_block_2_3>
...
except <exception_name_n> as <variable>:
    <exception_code_block_n>
else:
    <alternative_code_block>
finally:
    <another_code_block>

Extended exception handling example¶

In [35]:
def even_or_odd(num):
    try:
        num = int(num)
    except ValueError:
        print('Input cannot be converted into integer')
        return None
    except TypeError as msg:
        print(msg)
        return None
    else:
        print('Checking ' + str(num)) # Code block gets executed if no exception had been raised
    if num % 2 == 0:
        return 'even'
    else:
        return 'odd'
In [36]:
even_or_odd('forty-two')
Input cannot be converted into integer
In [37]:
even_or_odd(42.7)
Checking 42
Out[37]:
'even'

Extended exception handling example continued¶

In [38]:
def even_or_odd(num):
    def cast_int(num):
        try:
            new_num = int(num)
        except ValueError:
            print('Input cannot be converted into integer')
        except TypeError as msg:
            print(msg)
        else:
            print('Converted '+ str(num) + ' to ' + str(new_num))
            return new_num
    num = cast_int(num)
    if num is not None:
        if num % 2 == 0:
            return 'even'
        else:
            return 'odd'
In [39]:
even_or_odd('forty-two')
Input cannot be converted into integer
In [40]:
even_or_odd(42.7)
Converted 42.7 to 42
Out[40]:
'even'
In [41]:
even_or_odd([0, 1, 2])
int() argument must be a string, a bytes-like object or a real number, not 'list'

Raising exceptions¶

  • Python provides mechanisms not only for catching and handling exceptions
  • Exceptions can also be raised by programmer
  • Raising an exception is one of ways to control the program flow
  • raise statement is used to force a specified exception to occur
  • Exception can be one of the built-in types or defined by programmer
raise <exception_name>

or

raise <exception_name>(<message>)
In [42]:
raise IndexError
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [42], line 1
----> 1 raise IndexError

IndexError: 

Raising exceptions example¶

In [43]:
def calculate_median(lst):
    for i in range(len(lst)):
        try:
            lst[i] = float(lst[i])
        except:
            raise ValueError('All elements of the list must be numeric')
    lst.sort()
    n = len(lst)
    m = (n + 1)//2
    if n % 2 == 1:
        median = lst[m-1]
    else:
        median = sum(lst[m-1:m+1])/2
    return median
In [44]:
l = [0, 'one', 1, 2]

Raising exceptions example continued¶

In [45]:
calculate_median(l)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [43], line 4, in calculate_median(lst)
      3 try:
----> 4     lst[i] = float(lst[i])
      5 except:

ValueError: could not convert string to float: 'one'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In [45], line 1
----> 1 calculate_median(l)

Cell In [43], line 6, in calculate_median(lst)
      4         lst[i] = float(lst[i])
      5     except:
----> 6         raise ValueError('All elements of the list must be numeric')
      7 lst.sort()
      8 n = len(lst)

ValueError: All elements of the list must be numeric

Common types of exceptions in Python¶

Exception Description
SyntaxError Parser encountered a syntax error
IndexError Sequence subscript is out of range
NameError Local or global name is not found
TypeError Operation or function is applied to an object of inappropriate type
ValueError Operation or function receives an argument that has the right type but an inappropriate value
OSError I/O failures such as “file not found” or “disk full”
ImportError import statement had problems with loading a module
AssertionError assert statement failed

Extra: Full list of Python exceptions

Discretion in exception handling¶

Source: Reddit

Assertion¶

  • Assertions can be used to check whether conditions are as expected
  • assert statement provides another way of raising exceptions if expectations are not met
  • Such statements are particularly useful in debugging
assert <boolean_expression>

or

assert <boolean_expression>, <message>
In [46]:
assert False, 'Nobody expects the Spanish Inquisition!'
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In [46], line 1
----> 1 assert False, 'Nobody expects the Spanish Inquisition!'

AssertionError: Nobody expects the Spanish Inquisition!

Assertion example¶

In [47]:
def is_positive(num):
    assert num != 0, 'Input must be non-zero'
    if num > 0:
        return True
    else:
        return False
In [48]:
is_positive(0)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In [48], line 1
----> 1 is_positive(0)

Cell In [47], line 2, in is_positive(num)
      1 def is_positive(num):
----> 2     assert num != 0, 'Input must be non-zero'
      3     if num > 0:
      4         return True

AssertionError: Input must be non-zero

Testing¶

  • Process of running a program on pre-determined cases to ascertain that its functionality is consistent with expectations
  • Test cases consist of different assertions (of equality, boolean values, etc.)
  • Fully-featured unit testing framework in Python is provided by built-in unittest module

Extra: Python documentation on unit testing

Testing example¶

In [49]:
def calculate_median(lst):
    lst.sort()
    n = len(lst)
    m = (n + 1)//2
    if n % 2 == 1:
        median = lst[m-1]
    else:
        median = sum(lst[m-1:m+1])/2
    return median
In [50]:
# Test the equality of the result of function call and some value
def test_equal(func, value):
    assert func == value
    print('Equality test passed')
In [51]:
l = [0, 1, 1, 2, 3]
In [52]:
test_equal(calculate_median(l), 1)
Equality test passed

Defensive programming¶

  • Design your program to facilitate earlier failures, testing and debugging
  • Split up different componenets into functions or modules
  • Be strict about accepted inputs, use assertions or conditional statements to check them
  • Document assumptions and acceptable inputs using docstrings
  • Document non-trivial, potentially problematic and complex parts of code

Next Week¶

  • Data Wrangling