Week 5 Tutorial: Debugging and Testing in R

POP77001 Computer Programming for Social Scientists

Debugging with print()

  • print() statement can be used to check the internal state of a program during evaluation.
  • Can be placed in critical parts of code (before or after loops/function calls/objects loading).
  • For harder cases switch to R debugger.

Exercise: Debugging

  • See the function for calculating Pearson correlation below.
  • Recall that sample correlation can be calculated using this formula:

\[ r_{xy} = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2}\sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}} \]

where \(\bar{x}\) and \(\bar{y}\) are the means (averages) of variable \(x\) and \(y\), respectively.

  • What do you think is the correlation coefficient between vectors c(1, 2, 3, 4, 5) and c(-3, -5, -7, -9, -11)?
  • Check the output of the function, is it correct?
  • Find and fix any problems that you encounter

pearson <- function(vec_x, vec_y) {
  if (!(is.numeric(vec_x) && is.numeric(vec_y))) {
    stop("Both arguments must be numeric")
  }
  
  mean_x <- sum(vec_x)/length(vec_x)
  mean_y <- sum(vec_y)/length(vec_y)
  
  numerator <- sum((vec_x - mean_x) * (vec_y - mean_y))
  denominator <- (
    sum((vec_x - mean_x)^2)^1/2 *
      sum((vec_y - mean_y)^2)^1/2
  )
  
  r <- numerator/denominator
  
  # Make sure that floating point arithmetic does not
  # produce absolute values larger than 1
  r <- max(min(r, 1.0), -1.0)
  
  return(r)
}

R debugger

  • In addition to simply using print() function, R offers an interactive source code debugger.
  • It lets you
    • Step through the function at its execution time
    • Check the internal state as well as
    • Run run arbitrary code in that context
    • Set breakpoints when execution pauses for inspection

R Debugging Facilities

  • Three functions provide the the main entries into the debugging mode:
    • browser() - pauses the execution at a dedicated line in code (breakpoint)
    • debug()/undebug() - (un)sets a flag to run function in a debug mode (setting through)
    • debugonce() - triggers single stepping through a function

Breakpoints

calculate_median <- function(a) {
  a <- sort(a)
  n <- length(a)
  m <- (n + 1) %/% 2
  if (n %% 2 == 1) {
    med <- a[m]
  } else {
    browser()
    med <- mean(a[m:m+1])
  }
  return(med)
}
## Example for running in RStudio
v2 <- c(0, 1, 2, 2)
calculate_median(v2)
Called from: calculate_median(v2)
debug: med <- mean(a[m:m + 1])
debug: return(med)
[1] 2

Debugger Commands

Command Description
n(ext) Execute next line of the current function
s(tep) Execute next line, stepping inside the function (if present)
c(ontinue) Continue execution, only stop when breakpoint in encountered
f(inish) Finish execution of the current loop or function
Q(uit) Quit from the debugger, executed program is aborted

debugonce()

  • debugonce() function allows to run and step through the function.
  • It is equivalent to setting a breakpoint at the first line of the function and then running it.
debugonce(<function_name>, <arg_1>, <arg_2>, ..., <arg_n>)

## Example for running in RStudio
debugonce(calculate_median)
calculate_median(v2)
debugging in: calculate_median(v2)
debug: {
    a <- sort(a)
    n <- length(a)
    m <- (n + 1)%/%2
    if (n%%2 == 1) {
        med <- a[m]
    }
    else {
        browser()
        med <- mean(a[m:m + 1])
    }
    return(med)
}
debug: a <- sort(a)
debug: n <- length(a)
debug: m <- (n + 1)%/%2
debug: if (n%%2 == 1) {
    med <- a[m]
} else {
    browser()
    med <- mean(a[m:m + 1])
}
debug: browser()
debug: med <- mean(a[m:m + 1])
debug: return(med)
exiting from: calculate_median(v2)
[1] 2

Exercise: Using Debugger

  • Let’s look again at the problematic calculate_median function from the lecture.
  • Run R debugger and step through it.
  • While inside the function print out the values of m and the result of summation.
  • Fix the bugs.

Week 5 Exercise (unassessed)

  • Create tests for the fixed version of pearson() function that:
    • Check that the length of the result is 1;
    • Check that the result is between -1 and 1;
    • Check that the function errors out on non-numeric inputs.