Time to Code

Introduction to the Breast Cancer Modeling Problem

We will develop a 4-state Markov model to perform an analysis of the expected outcomes for a cohort of 55-year-old women who have undergone tumor excision for localized breast cancer. After surgery, all patients are initially tumor-free (i.e., will start in the “Local” Markov state). Each year, patients in the Local state face a 2% chance that they will experience a recurrence. Of these recurrences, 75% will present as metastatic disease (and patients will enter the “Mets” Markov state), and the remainder present as local disease (and patients will enter the “Recur” Markov state). Patients in the Recur state undergo repeat resection. Following this operation, they face a 6% chance of recurrence each year (assume that all local recurrences - 1st, 2nd, 3rd, etc. - have equivalent prognoses). Of these recurrences, 90% will present as metastatic disease. If the patients do not recur after the initial local recurrence, they remain in the Recur state since the prognosis remains the same each year. Only patients with metastatic disease can die of breast cancer. All women face a probability of dying from other causes based on the U.S. life tables. Women who die (either of breast cancer or other causes) will enter the “Dead” Markov state.

The Markov model described above represents usual care for women with breast cancer. Once you have constructed that Markov model you will then add a second strategy of a hypothetical new treatment. The effect of this new treatment is to reduce the initial risk of cancer recurrence by 50%; however, the treatment also lowers the quality of life of women taking the medication because of side effects. Treatment is only taken by women who have not experienced a recurrence (i.e., those who reside in the Local state).

After comparing the two strategies based on quality-adjusted life expectancy, you will incorporate costs and calculate the incremental cost per quality-adjusted life year gained of treatment compared with usual care.


Utilities

In a standard decision tree, the utility values are assigned at the end of the tree at each of the terminal nodes. In a Markov model, the terminal node specifies what state to go to for the next cycle, rather than the payoff.

There are three types of utilities in a Markov model:

  • The initial utility (Init) of a state is the utility a “person” accrues because of being in a state when the model starts (applied only in the first cycle).

  • The incremental utility (Incr) of a state is the utility a “person” accrues each time she or he arrives in that state as the model runs (applied many times).

  • The final utility (Final) of a state is the utility a “person” accrues if she or he is in that state at the time the model finishes running (applied only in the final cycle).

You will define these utilities using utility variables (uLocal, uRecur, and uMets) in order to be able to calculate quality-adjusted life expectancy, and use a discount variable (r) that will allow you to discount life expectancy (and quality-adjusted life expectancy). The variable “uLocal” is the utility for being in the Local state. The variable “uRecur” is the utility for being in the Recur state. The variable “uMets” is the utility for being in the Mets state. If these utility values are equal to 1, then the expected value represents life expectancy (the model gives “credit” of 1 full year for spending a year in any “non-dead” state). We will use these to calculate quality-adjusted life expectancy.

Branch Initial* Utility Assignments Incremental Final
Local 0.5*uLocal uLocal/(1+r)^_stage 0
Recur 0 uRecur/(1+r)^_stage 0
Mets 0 uMets/(1+r)^_stage 0
Dead 0 0 0

* First year gets credit for half a year - known as the “half-cycle correction.”

Tasks

  1. Setting up the R Scripts:

    • Create an R script named utils.R. This script will house all the functions we will develop for our analysis.
    • Create another R script named CEA.R (Cost-Effectiveness Analysis). This will be the main script where we will perform our analysis.
  2. Initializing the CEA.R Script: Define the following variables: Age, muMets, pBCA2, pMets1, pMets2, r, uMets, and uRecur.

Variable Value
uRecur 0.80
uMets 0.40
r 0.03
pMets2 0.90
pMets1 0.75
pBCA2 0.06
muMets 0.50
Age 55

These are the variables that you will define globally. In the CEA.R script, start by copying and pasting the following code block:

##############################################################################
# Author: [Your Name]
# Purpose: Main analysis file for breast cancer treatment simulation
##############################################################################

# Libraries used
library(readxl) # used to hazard tables in from Excel 
library(ggplot2) # used to graph results

# Functions for our analysis
source('code/utils.R')

# Mortality tables
mufem <- read_excel("data/mufem.xlsx", sheet = 'FEMALES') 

##############################################################################
# Run the simulation
##############################################################################

# Global variables
uRecur <- 0.80
uMets <- 0.40
uDead <- 0
r <- 0.03
pMets2 <- 0.90
pMets1 <- 0.75
pBCA2 <- 0.06
muMets <- 0.50
Age <- 55
  1. Save and Commit:

    • Save both scripts in your project directory.
    • Commit these initial scripts to your Git repository, ensuring that you’re tracking the changes and setting a strong foundation for version control.
  2. Writing some helper functions:

The function for pDie (shown below) converts the annual rates from the life table (in mufem) to annual probabilities. The first cycle (year 1) through the Markov model, stage is equal to 0. Hence, the rate selected from the table is that associated with the starting age of the cohort (55). The second cycle through the model, stage is equal to 1 and the annual rate selected from the mufem table is that associated with 56-year-old women.

pDie <- function(stage) {
  age_index <- Age + stage
  rate <- mufem$`annual hazard rate`[age_index]
  return(1 - exp(-rate))
}

The overall annual mortality rate for women with metastatic disease is equal to the sum of the rates of dying from other causes and the excess disease-specific mortality rate (defined as muMets). The function for pDieMets (shown below) converts the sum of these rates to annual probabilities.

pDieMets <- function(stage) {
  age_index <- Age + stage
  rate <- mufem$`annual hazard rate`[age_index] + muMets
  return(1 - exp(-rate))
}

Lets add both of these functions to our utils.R script.

  1. Writing main simulation function and defining local variables:

You should have 2 undefined variables now: pBCA1 and uLocal. Since the value of pBCA1 and uLocal will be different depending on the strategy, you will define these two variables locally within our simulation function.

  • “Usual Care”: uLocal = 0.95 and pBCA1 = 0.02.

  • “Treat”: uLocal = 0.90 and pBCA1 = 0.01.

Add the following function to your utils.R script.

# Define the Markov simulation function
simulate_markov <- function(initial_population=1, cycles=45, strategy='Usual Care') {

if(strategy == 'Usual Care'){
  pBCA1 <- 0.02
  uLocal <- 0.95
  
} else{
  pBCA1 <- 0.01
  uLocal <- 0.90
}

utilities <- c(uLocal, uRecur, uMets, uDead)

results <- data.frame(Cycle = 1:cycles, 
                      Local = rep(0, cycles), 
                      Recur = rep(0, cycles), 
                      Mets = rep(0, cycles),
                      Dead = rep(0, cycles),
                      cumulative.utility = rep(0, cycles))

current_population <- c(Local = initial_population, Recur = 0, Mets = 0, Dead = 0)

for (cycle in 1:cycles) {
  
  if(cycle == 1){ #half cycle correction
    utility_value <- sum(current_population*utilities) * 0.5
  }
  
  # Transition logic
  local_death <- current_population["Local"] * pDie(cycle)
  recur_death <- current_population["Recur"] * pDie(cycle)
  
  local_recur <- current_population["Local"] * pBCA1
  local_mets <- local_recur * pMets1
  local_recur <- local_recur * (1 - pMets1)
  
  recur_recur <- current_population["Recur"] * pBCA2
  recur_mets <- recur_recur * pMets2
  recur_recur <- recur_recur * (1 - pMets2)
  
  mets_death <- current_population["Mets"] * pDieMets(cycle)
  
  # Update populations
  current_population["Local"] <- current_population["Local"] - local_recur - local_mets - local_death
  current_population["Recur"] <- current_population["Recur"] + local_recur - recur_recur - recur_mets - recur_death
  current_population["Mets"] <- current_population["Mets"] + local_mets + recur_mets - mets_death
  current_population["Dead"] <- current_population["Dead"] + mets_death + local_death + recur_death
  
  # Update QALY
  utility_value <- utility_value + (sum(current_population*utilities)/((1 + r)^cycle))

  # Store results for this cycle
  results[cycle, 2:6] <- c(current_population, utility_value)
}

return(results)
}
  1. Run the model and save some outputs:

Great now we have everything we need to run this first analysis! Lets add the following code under “Run the simulation” in the CEA.R script.

##############################################################################
# Run the simulation
##############################################################################

# Global variables
uRecur <- 0.80
uMets <- 0.40
uDead <- 0
r <- 0.03
pMets2 <- 0.90
pMets1 <- 0.75
pBCA2 <- 0.06
muMets <- 0.50
Age <- 55

results_usual <- simulate_markov(strategy = 'Usual Care')
results_treat <- simulate_markov(strategy = 'Treatment')

If this code runs we are going to add one more section:

##############################################################################
# Visualization
##############################################################################

# Plot the results for both groups
plot_usual <- ggplot(results_usual, aes(x = Cycle)) +
  geom_line(aes(y = Local, color = "Local")) +
  geom_line(aes(y = Recur, color = "Recur")) +
  geom_line(aes(y = Mets, color = "Mets")) +
  geom_line(aes(y = Dead, color = "Dead")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Markov Model Simulation for Usual Care",
       y = "Number of Individuals") +
  theme_minimal()

ggsave('output/figures/Usual Care Progression.png', 
       plot_usual, width = 7, height = 4, bg='white')

plot_treatment <- ggplot(results_treat, aes(x = Cycle)) +
  geom_line(aes(y = Local, color = "Local")) +
  geom_line(aes(y = Recur, color = "Recur")) +
  geom_line(aes(y = Mets, color = "Mets")) +
  geom_line(aes(y = Dead, color = "Dead")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Markov Model Simulation for Treatment",
       y = "Number of Individuals") +
  theme_minimal()

ggsave('output/figures/Treatment Progression.png', 
       plot_treatment, width = 7, height = 4, bg='white')

This code generates two figures showing the disease progression of the cohort and exports them straight to our output/figures folder. This means that everytime we run this script these figures are getting generated. This is good because we want code that can generate all results and figures from scratch everytime!

If you have not commited and pushed code in a while you should do so!!!!

  1. Let’s add in costs!:

Cost-Effectiveness Analysis

You will now add costs as a 2nd Markov reward for each of the 4 states. Any differences between the two strategies in terms of costs will have to be locally assigned. The cost assignments are given in the following table. You will need to create variables when you enter unrecognized variables.

Cost Assignments

Branch Initial* Incremental Final
Local 0.5*(cLocal+cRx) (cLocal+cRx)/(1+r)^_stage 0
Recur 0 cRecur/(1+r)^_stage 0
Mets 0 cMets/(1+r)^_stage 0
Dead 0 0 0

* First year gets credit for half a year - known as the “half-cycle correction.”

The annual costs of breast cancer (cLocal, cRecur, and cMets) will be the same for both strategies

Variable Value
cRecur 5000
cMets 20000
cLocal 500

Therefore, these variables can be assigned globally. So let’s go ahead and add the following to CEA.R:

cRecur <- 5000
cMets <- 20000
cLocal <- 500

To locally assign a cost for cRx (the annual cost of treatment), we will do so in our markov model function we wrote earlier. Here we can either decided to modify the one we had previously written, or we could write a new function. I am going to write a new function and add it utils.R. We will need to locally assign the cost of treatment (cRx), which is 0 if “Usual Care” and 1000 if “Treatment”

# Define the Markov simulation function with costs
simulate_markov_costs <- function(initial_population=1, cycles=45, strategy='Usual Care') {
  
  if(strategy == 'Usual Care'){
    pBCA1 <- 0.02
    uLocal <- 0.95
    cRx <- 0

  } else{
    pBCA1 <- 0.01
    uLocal <- 0.90
    cRx <- 1000
  }
  
  utilities <- c(uLocal, uRecur, uMets, uDead)
  costs <- c(cLocal + cRx, cRecur, cMets, 0)
  
  results <- data.frame(Cycle = 1:cycles, 
                        Local = rep(0, cycles), 
                        Recur = rep(0, cycles), 
                        Mets = rep(0, cycles),
                        Dead = rep(0, cycles),
                        cumulative.utility = rep(0, cycles),
                        cumulative.costs = rep(0,cycles))
  
  current_population <- c(Local = initial_population, Recur = 0, Mets = 0, Dead = 0)
  
  for (cycle in 1:cycles) {
    
    if(cycle == 1){ #half cycle correction
      utility_value <- sum(current_population*utilities) * 0.5
      cost_value <- sum(current_population*costs) * 0.5
    }
    
    # Transition logic
    local_death <- current_population["Local"] * pDie(cycle)
    recur_death <- current_population["Recur"] * pDie(cycle)
    
    local_recur <- current_population["Local"] * pBCA1
    local_mets <- local_recur * pMets1
    local_recur <- local_recur * (1 - pMets1)
    
    recur_recur <- current_population["Recur"] * pBCA2
    recur_mets <- recur_recur * pMets2
    recur_recur <- recur_recur * (1 - pMets2)
    
    mets_death <- current_population["Mets"] * pDieMets(cycle)
    
    # Update populations
    current_population["Local"] <- current_population["Local"] - local_recur - local_mets - local_death
    current_population["Recur"] <- current_population["Recur"] + local_recur - recur_recur - recur_mets - recur_death
    current_population["Mets"] <- current_population["Mets"] + local_mets + recur_mets - mets_death
    current_population["Dead"] <- current_population["Dead"] + mets_death + local_death + recur_death
    
    # Update QALY
    utility_value <- utility_value + (sum(current_population*utilities)/((1 + r)^cycle))

    # Update QALY
    cost_value <- cost_value + (sum(current_population*costs)/((1 + r)^cycle))
    
    # Store results for this cycle
    results[cycle, 2:7] <- c(current_population, utility_value, cost_value)
  }
  
  return(results)
}

And now we are almost done! Let’s just use this function to calculate our costs and we will export those tables to our outputs/tables folder. So now in our CEA.R script let’s add:

##############################################################################
# Export cost tables
##############################################################################

cRecur <- 5000
cMets <- 20000
cLocal <- 500

costs_usual <- simulate_markov_costs(strategy = 'Usual Care')
costs_treat <- simulate_markov_costs(strategy = 'Treatment')

write.csv(costs_usual, 'output/tables/Costs Usual Care.csv')
write.csv(costs_treat, 'output/tables/Costs Treatment.csv')

Don’t forget to commit and push your changes!!!

And congratulations! If you made it successfully through this your repository should look something like mine:

https://github.com/jacobjameson/BCP_Jameson_Coots/