Time to Code Part II
Robustness
Error Management
Protect the User:
- Make assumptions and expectations explicit.
- Check values before processing them.
- Identify and manage exceptions.
- Produce errors when expectations are not met.
- Consider error handling options:
- Redirect the program flow.
- Log or report the error to allow the user or developer to troubleshoot.
- If necessary, abort the execution.
Advanced Robustness: Unit Tests
Protect the Developer (You!):
- Test the expected behavior of your functions:
- Confirm that a known input produces the expected output.
- Ensure that errors are produced as expected when invalid inputs are provided.
- Capture unexpected errors to identify further opportunities for error management.
- Automate running tests when pushing to version control systems like GitHub using Continuous Integration (CI).
- Unit tests are especially valuable as your project grows in complexity.
More on tests later…
Throwing an Error (Python Example)
Suppose you have a function that calculates the risk of a cardiovascular event based on patient data.
def calculate_cardiovascular_risk(age, cholesterol_level):
if cholesterol_level <= 0:
raise ValueError("Cholesterol level must be positive.")
= age * 0.2 + cholesterol_level * 0.8
risk return risk
# This will raise an error
=55, cholesterol_level=-180) calculate_cardiovascular_risk(age
Why not simply adjust the function output?
def calculate_cardiovascular_risk(age, cholesterol_level):
if cholesterol_level <= 0:
return None
= age * 0.2 + cholesterol_level * 0.8
risk return risk
# This will return None, which may be ambiguous
print(calculate_cardiovascular_risk(age=55, cholesterol_level=-180))
Because returning None
may be unclear whether it’s expected behavior or indicative of a problem. Explicit errors help identify issues promptly.
Warning Messages Without Breaking Execution (R Example)
An error stops code execution:
<- function(age, cholesterol_level) {
calculate_cardiovascular_risk if (cholesterol_level <= 0) {
stop("Cholesterol level must be positive.")
}<- age * 0.2 + cholesterol_level * 0.8
risk return(risk)
}
# This will throw an error
calculate_cardiovascular_risk(age = 55, cholesterol_level = -180)
Capture the error but release a warning:
<- function(age, cholesterol_level) {
calculate_cardiovascular_risk if (cholesterol_level <= 0) {
warning("Cholesterol level must be positive.")
return(NA)
}<- age * 0.2 + cholesterol_level * 0.8
risk return(risk)
}
# This will issue a warning and return NA
calculate_cardiovascular_risk(age = 55, cholesterol_level = -180)
Redirecting with Exceptions (Python)
If you do not want to interrupt your script when an error is raised, use try and except blocks.
def calculate_cardiovascular_risk(age, cholesterol_level):
if cholesterol_level <= 0:
raise ValueError("Cholesterol level must be positive.")
= age * 0.2 + cholesterol_level * 0.8
risk return risk
try:
= calculate_cardiovascular_risk(55, -180)
risk_score except ValueError as e:
print(f"Error encountered: {e}")
# Handle the error, perhaps by setting a default value or skipping this entry
= None risk_score
Redirecting with Exceptions (R)
In R, you can use tryCatch()
to handle exceptions gracefully.
<- function(age, cholesterol_level) {
calculate_cardiovascular_risk if (cholesterol_level <= 0) {
stop("Cholesterol level must be positive.")
}<- age * 0.2 + cholesterol_level * 0.8
risk return(risk)
}
<- tryCatch({
result calculate_cardiovascular_risk(55, -180)
error = function(e) {
}, message("Error encountered: ", e$message)
# Handle the error, perhaps by setting a default value or skipping this entry
return(NA)
})
print(result)
Validating Input
Consider early statements in your script to validate input data.
Python Example:
if not patient_data:
raise ValueError("Patient data cannot be empty.")
if not isinstance(patient_age, int) or patient_age <= 0:
raise ValueError("Patient age must be a positive integer.")
R Example:
if (nrow(patient_data) == 0) {
stop("Patient data cannot be empty.")
}
if (!is.numeric(patient_age) || patient_age <= 0) {
stop("Patient age must be a positive number.")
}
Expectations and Assumptions
Anticipate Potential Issues:
- Invalid Input Values:
- Users may input unrealistic or impossible values (e.g., negative ages, probabilities greater than 1).
- Incomplete or Missing Data:
- Essential data fields may be missing or contain null values.
- Incorrect Data Types:
- Numerical fields may be input as strings, or categorical variables may not match expected categories.
- Edge Cases:
- Inputs at the extremes of acceptable ranges may cause unexpected behavior.
Best Practices for Handling Assumptions
Make Assumptions Explicit:
- Documentation:
- Clearly state all assumptions in your code comments and documentation.
- Include acceptable input ranges, data types, and expected formats.
- Input Validation:
- Implement checks to validate input data before processing.
- Provide informative error messages to guide users in correcting input.
Example: Documenting Assumptions
- In your
README.md
or documentation:- “This model assumes that patient ages are between 0 and 120 years.”
- “Cholesterol levels must be provided in mg/dL and be within the range of 100 to 400 mg/dL.”
Implementing Input Validation
Python Example:
def calculate_cardiovascular_risk(age, cholesterol_level):
# Validate inputs
if not (0 <= age <= 120):
raise ValueError("Age must be between 0 and 120.")
if not (100 <= cholesterol_level <= 400):
raise ValueError("Cholesterol level must be between 100 and 400 mg/dL.")
# Proceed with calculation
= age * 0.2 + cholesterol_level * 0.8
risk return risk
# Usage
try:
= calculate_cardiovascular_risk(55, 180)
risk_score except ValueError as e:
print(f"Input error: {e}")
R Example:
<- function(age, cholesterol_level) {
calculate_cardiovascular_risk # Validate inputs
if (age < 0 || age > 120) {
stop("Age must be between 0 and 120.")
}if (cholesterol_level < 100 || cholesterol_level > 400) {
stop("Cholesterol level must be between 100 and 400 mg/dL.")
}# Proceed with calculation
<- age * 0.2 + cholesterol_level * 0.8
risk return(risk)
}
# Usage
<- tryCatch({
risk_score calculate_cardiovascular_risk(55, 180)
error = function(e) {
}, message("Input error: ", e$message)
NA
})
Testing Your Functions
Implement Unit Tests to Verify Behavior:
- Use Testing Frameworks:
- Python:
unittest
,pytest
- R:
testthat
- Python:
- Create Test Cases for:
- Valid inputs (expected to succeed)
- Invalid inputs (expected to raise errors)
- Edge cases (e.g., inputs at the boundary of acceptable ranges)
Example: Python Unit Test with unittest
import unittest
class TestCardiovascularRiskCalculation(unittest.TestCase):
def test_valid_inputs(self):
self.assertAlmostEqual(calculate_cardiovascular_risk(55, 180), 55 * 0.2 + 180 * 0.8)
def test_invalid_age(self):
with self.assertRaises(ValueError):
-1, 180)
calculate_cardiovascular_risk(
def test_invalid_cholesterol(self):
with self.assertRaises(ValueError):
55, 500)
calculate_cardiovascular_risk(
if __name__ == '__main__':
unittest.main()
Example: R Unit Test with testthat
library(testthat)
test_that("Valid inputs return correct risk", {
expect_equal(calculate_cardiovascular_risk(55, 180), 55 * 0.2 + 180 * 0.8)
})
test_that("Invalid age throws an error", {
expect_error(calculate_cardiovascular_risk(-1, 180), "Age must be between 0 and 120")
})
test_that("Invalid cholesterol level throws an error", {
expect_error(calculate_cardiovascular_risk(55, 500), "Cholesterol level must be between 100 and 400")
})
➡️ Exercise
Task:
Review your existing codebase for any functions that accept input data.
Steps:
- Identify Functions with Inputs:
- Locate functions that take input data as arguments.
- Note any assumptions or requirements for these inputs.
- Implement Input Validation:
- Add checks to ensure inputs meet expected criteria.
- Provide clear and informative error messages for invalid inputs.
- Document Assumptions:
- Update your documentation to include all assumptions and input requirements.
- Write Unit Tests:
- Create tests for both valid and invalid inputs.
- Ensure that your functions behave as expected in all cases.