Time to Code Part II

Robustness

Error Management

Protect the User:

Make assumptions and expectations explicit.
- Check values before processing them.
- Identify and manage exceptions.
Produce errors when expectations are not met.
Consider error handling options:
- Redirect the program flow.
- Log or report the error to allow the user or developer to troubleshoot.
- If necessary, abort the execution.

Advanced Robustness: Unit Tests

Protect the Developer (You!):

Test the expected behavior of your functions:
- Confirm that a known input produces the expected output.
- Ensure that errors are produced as expected when invalid inputs are provided.
Capture unexpected errors to identify further opportunities for error management.
Automate running tests when pushing to version control systems like GitHub using Continuous Integration (CI).
Unit tests are especially valuable as your project grows in complexity.

More on tests later…

Throwing an Error (Python Example)

Suppose you have a function that calculates the risk of a cardiovascular event based on patient data.

def calculate_cardiovascular_risk(age, cholesterol_level):
    if cholesterol_level <= 0:
        raise ValueError("Cholesterol level must be positive.")
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

# This will raise an error
calculate_cardiovascular_risk(age=55, cholesterol_level=-180)

Why not simply adjust the function output?

def calculate_cardiovascular_risk(age, cholesterol_level):
    if cholesterol_level <= 0:
        return None
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

# This will return None, which may be ambiguous
print(calculate_cardiovascular_risk(age=55, cholesterol_level=-180))

Because returning None may be unclear whether it’s expected behavior or indicative of a problem. Explicit errors help identify issues promptly.

Warning Messages Without Breaking Execution (R Example)

An error stops code execution:

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  if (cholesterol_level <= 0) {
    stop("Cholesterol level must be positive.")
  }
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

# This will throw an error
calculate_cardiovascular_risk(age = 55, cholesterol_level = -180)

Capture the error but release a warning:

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  if (cholesterol_level <= 0) {
    warning("Cholesterol level must be positive.")
    return(NA)
  }
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

# This will issue a warning and return NA
calculate_cardiovascular_risk(age = 55, cholesterol_level = -180)

Redirecting with Exceptions (Python)

If you do not want to interrupt your script when an error is raised, use try and except blocks.

def calculate_cardiovascular_risk(age, cholesterol_level):
    if cholesterol_level <= 0:
        raise ValueError("Cholesterol level must be positive.")
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

try:
    risk_score = calculate_cardiovascular_risk(55, -180)
except ValueError as e:
    print(f"Error encountered: {e}")
    # Handle the error, perhaps by setting a default value or skipping this entry
    risk_score = None

Redirecting with Exceptions (R)

In R, you can use tryCatch() to handle exceptions gracefully.

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  if (cholesterol_level <= 0) {
    stop("Cholesterol level must be positive.")
  }
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

result <- tryCatch({
  calculate_cardiovascular_risk(55, -180)
}, error = function(e) {
  message("Error encountered: ", e$message)
  # Handle the error, perhaps by setting a default value or skipping this entry
  return(NA)
})

print(result)

Validating Input

Consider early statements in your script to validate input data.

Python Example:

if not patient_data:
    raise ValueError("Patient data cannot be empty.")

if not isinstance(patient_age, int) or patient_age <= 0:
    raise ValueError("Patient age must be a positive integer.")

R Example:

if (nrow(patient_data) == 0) {
  stop("Patient data cannot be empty.")
}

if (!is.numeric(patient_age) || patient_age <= 0) {
  stop("Patient age must be a positive number.")
}

Expectations and Assumptions

Anticipate Potential Issues:

Invalid Input Values:
- Users may input unrealistic or impossible values (e.g., negative ages, probabilities greater than 1).
Incomplete or Missing Data:
- Essential data fields may be missing or contain null values.
Incorrect Data Types:
- Numerical fields may be input as strings, or categorical variables may not match expected categories.
Edge Cases:
- Inputs at the extremes of acceptable ranges may cause unexpected behavior.

Best Practices for Handling Assumptions

Make Assumptions Explicit:

Documentation:
- Clearly state all assumptions in your code comments and documentation.
- Include acceptable input ranges, data types, and expected formats.
Input Validation:
- Implement checks to validate input data before processing.
- Provide informative error messages to guide users in correcting input.

Example: Documenting Assumptions

In your README.md or documentation:
- “This model assumes that patient ages are between 0 and 120 years.”
- “Cholesterol levels must be provided in mg/dL and be within the range of 100 to 400 mg/dL.”

Implementing Input Validation

Python Example:

def calculate_cardiovascular_risk(age, cholesterol_level):
    # Validate inputs
    if not (0 <= age <= 120):
        raise ValueError("Age must be between 0 and 120.")
    if not (100 <= cholesterol_level <= 400):
        raise ValueError("Cholesterol level must be between 100 and 400 mg/dL.")
    # Proceed with calculation
    risk = age * 0.2 + cholesterol_level * 0.8
    return risk

# Usage
try:
    risk_score = calculate_cardiovascular_risk(55, 180)
except ValueError as e:
    print(f"Input error: {e}")

R Example:

calculate_cardiovascular_risk <- function(age, cholesterol_level) {
  # Validate inputs
  if (age < 0 || age > 120) {
    stop("Age must be between 0 and 120.")
  }
  if (cholesterol_level < 100 || cholesterol_level > 400) {
    stop("Cholesterol level must be between 100 and 400 mg/dL.")
  }
  # Proceed with calculation
  risk <- age * 0.2 + cholesterol_level * 0.8
  return(risk)
}

# Usage
risk_score <- tryCatch({
  calculate_cardiovascular_risk(55, 180)
}, error = function(e) {
  message("Input error: ", e$message)
  NA
})

Testing Your Functions

Implement Unit Tests to Verify Behavior:

Use Testing Frameworks:
- Python: unittest, pytest
- R: testthat
Create Test Cases for:
- Valid inputs (expected to succeed)
- Invalid inputs (expected to raise errors)
- Edge cases (e.g., inputs at the boundary of acceptable ranges)

Example: Python Unit Test with unittest

import unittest

class TestCardiovascularRiskCalculation(unittest.TestCase):
    def test_valid_inputs(self):
        self.assertAlmostEqual(calculate_cardiovascular_risk(55, 180), 55 * 0.2 + 180 * 0.8)

    def test_invalid_age(self):
        with self.assertRaises(ValueError):
            calculate_cardiovascular_risk(-1, 180)

    def test_invalid_cholesterol(self):
        with self.assertRaises(ValueError):
            calculate_cardiovascular_risk(55, 500)

if __name__ == '__main__':
    unittest.main()

Example: R Unit Test with testthat

library(testthat)

test_that("Valid inputs return correct risk", {
  expect_equal(calculate_cardiovascular_risk(55, 180), 55 * 0.2 + 180 * 0.8)
})

test_that("Invalid age throws an error", {
  expect_error(calculate_cardiovascular_risk(-1, 180), "Age must be between 0 and 120")
})

test_that("Invalid cholesterol level throws an error", {
  expect_error(calculate_cardiovascular_risk(55, 500), "Cholesterol level must be between 100 and 400")
})

➡️ Exercise

Task:

Review your existing codebase for any functions that accept input data.

Steps:

Identify Functions with Inputs:
- Locate functions that take input data as arguments.
- Note any assumptions or requirements for these inputs.
Implement Input Validation:
- Add checks to ensure inputs meet expected criteria.
- Provide clear and informative error messages for invalid inputs.
Document Assumptions:
- Update your documentation to include all assumptions and input requirements.
Write Unit Tests:
- Create tests for both valid and invalid inputs.
- Ensure that your functions behave as expected in all cases.