Time to Code Part II

Robustness

Comprehensive Error Management

Error Management

Protect the User:

  • Make assumptions and expectations explicit.
    • Check values before processing them.
    • Identify and manage exceptions.
  • Produce errors when expectations are not met.
  • Consider error handling options:
    • Redirect the program flow.
    • Log or report the error to allow the user or developer to troubleshoot.
    • If necessary, abort the execution.

Advanced Robustness: Unit Tests

Protect the Developer (You!):

  • Test the expected behavior of your functions:
    • Confirm that a known input produces the expected output.
    • Ensure that errors are produced as expected when invalid inputs are provided.
  • Capture unexpected errors to identify further opportunities for error management.
  • Automate running tests when pushing to version control systems like GitHub using Continuous Integration (CI).
  • Unit tests are especially valuable as your project grows in complexity.

More on tests later…

Throwing an Error (Python Example)

Suppose you have a function that calculates the risk of a cardiovascular event based on patient data.

def calculate_cardiovascular_risk(age, cholesterol_level):
    if cholesterol_level <= 0:
        raise ValueError("Cholesterol level must be positive.")
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

# This will raise an error
calculate_cardiovascular_risk(age=55, cholesterol_level=-180)

Why not simply adjust the function output?

def calculate_cardiovascular_risk(age, cholesterol_level):
    if cholesterol_level <= 0:
        return None
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

# This will return None, which may be ambiguous
print(calculate_cardiovascular_risk(age=55, cholesterol_level=-180))

Because returning None may be unclear whether it’s expected behavior or indicative of a problem. Explicit errors help identify issues promptly.

Warning Messages Without Breaking Execution (R Example)

An error stops code execution:

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  if (cholesterol_level <= 0) {
    stop("Cholesterol level must be positive.")
  }
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

# This will throw an error
calculate_cardiovascular_risk(age = 55, cholesterol_level = -180)

Capture the error but release a warning:

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  if (cholesterol_level <= 0) {
    warning("Cholesterol level must be positive.")
    return(NA)
  }
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

# This will issue a warning and return NA
calculate_cardiovascular_risk(age = 55, cholesterol_level = -180)

Redirecting with Exceptions (Python)

If you do not want to interrupt your script when an error is raised, use try and except blocks.

def calculate_cardiovascular_risk(age, cholesterol_level):
    if cholesterol_level <= 0:
        raise ValueError("Cholesterol level must be positive.")
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

try:
    risk_score = calculate_cardiovascular_risk(55, -180)
except ValueError as e:
    print(f"Error encountered: {e}")
    # Handle the error, perhaps by setting a default value or skipping this entry
    risk_score = None

Redirecting with Exceptions (R)

In R, you can use tryCatch() to handle exceptions gracefully.

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  if (cholesterol_level <= 0) {
    stop("Cholesterol level must be positive.")
  }
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

result <- tryCatch({
  calculate_cardiovascular_risk(55, -180)
}, error = function(e) {
  message("Error encountered: ", e$message)
  # Handle the error, perhaps by setting a default value or skipping this entry
  return(NA)
})

print(result)

Validating Input

Consider early statements in your script to validate input data.

Python Example:

if not patient_data:
    raise ValueError("Patient data cannot be empty.")

if not isinstance(patient_age, int) or patient_age <= 0:
    raise ValueError("Patient age must be a positive integer.")

R Example:

if (nrow(patient_data) == 0) {
  stop("Patient data cannot be empty.")
}

if (!is.numeric(patient_age) || patient_age <= 0) {
  stop("Patient age must be a positive number.")
}

Expectations and Assumptions

Anticipate Potential Issues:

  • Invalid Input Values:
    • Users may input unrealistic or impossible values (e.g., negative ages, probabilities greater than 1).
  • Incomplete or Missing Data:
    • Essential data fields may be missing or contain null values.
  • Incorrect Data Types:
    • Numerical fields may be input as strings, or categorical variables may not match expected categories.
  • Edge Cases:
    • Inputs at the extremes of acceptable ranges may cause unexpected behavior.

Source: cartoontester

Best Practices for Handling Assumptions

Make Assumptions Explicit:

  • Documentation:
    • Clearly state all assumptions in your code comments and documentation.
    • Include acceptable input ranges, data types, and expected formats.
  • Input Validation:
    • Implement checks to validate input data before processing.
    • Provide informative error messages to guide users in correcting input.

Example: Documenting Assumptions

  • In your README.md or documentation:
    • “This model assumes that patient ages are between 0 and 120 years.”
    • “Cholesterol levels must be provided in mg/dL and be within the range of 100 to 400 mg/dL.”

Implementing Input Validation

Python Example:

def calculate_cardiovascular_risk(age, cholesterol_level):
    # Validate inputs
    if not (0 <= age <= 120):
        raise ValueError("Age must be between 0 and 120.")
    if not (100 <= cholesterol_level <= 400):
        raise ValueError("Cholesterol level must be between 100 and 400 mg/dL.")
    # Proceed with calculation
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

# Usage
try:
    risk_score = calculate_cardiovascular_risk(55, 180)
except ValueError as e:
    print(f"Input error: {e}")

R Example:

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  # Validate inputs
  if (age < 0 || age > 120) {
    stop("Age must be between 0 and 120.")
  }
  if (cholesterol_level < 100 || cholesterol_level > 400) {
    stop("Cholesterol level must be between 100 and 400 mg/dL.")
  }
  # Proceed with calculation
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

# Usage
risk_score <- tryCatch({
  calculate_cardiovascular_risk(55, 180)
}, error = function(e) {
  message("Input error: ", e$message)
  NA
})

Testing Your Functions

Implement Unit Tests to Verify Behavior:

  • Use Testing Frameworks:
    • Python: unittest, pytest
    • R: testthat
  • Create Test Cases for:
    • Valid inputs (expected to succeed)
    • Invalid inputs (expected to raise errors)
    • Edge cases (e.g., inputs at the boundary of acceptable ranges)

Example: Python Unit Test with unittest

import unittest

class TestCardiovascularRiskCalculation(unittest.TestCase):
    def test_valid_inputs(self):
        self.assertAlmostEqual(calculate_cardiovascular_risk(55, 180), 55 * 0.2 + 180 * 0.8)

    def test_invalid_age(self):
        with self.assertRaises(ValueError):
            calculate_cardiovascular_risk(-1, 180)

    def test_invalid_cholesterol(self):
        with self.assertRaises(ValueError):
            calculate_cardiovascular_risk(55, 500)

if __name__ == '__main__':
    unittest.main()

Example: R Unit Test with testthat

library(testthat)

test_that("Valid inputs return correct risk", {
  expect_equal(calculate_cardiovascular_risk(55, 180), 55 * 0.2 + 180 * 0.8)
})

test_that("Invalid age throws an error", {
  expect_error(calculate_cardiovascular_risk(-1, 180), "Age must be between 0 and 120")
})

test_that("Invalid cholesterol level throws an error", {
  expect_error(calculate_cardiovascular_risk(55, 500), "Cholesterol level must be between 100 and 400")
})

➡️ Exercise

Task:

Review your existing codebase for any functions that accept input data.

Steps:

  1. Identify Functions with Inputs:
    • Locate functions that take input data as arguments.
    • Note any assumptions or requirements for these inputs.
  2. Implement Input Validation:
    • Add checks to ensure inputs meet expected criteria.
    • Provide clear and informative error messages for invalid inputs.
  3. Document Assumptions:
    • Update your documentation to include all assumptions and input requirements.
  4. Write Unit Tests:
    • Create tests for both valid and invalid inputs.
    • Ensure that your functions behave as expected in all cases.