[1] 🎈Simulation Schemes

For simulating competing risks (CR) data, we have to simulate event times and event types. Below are two schemes for simulation:

  1. First simulate event types, then simulate event times: This scheme simulates from a proportional subdistribution hazards (PSDH) model. It is implemented in ‘fastcmprsk’ R package (Fine and Gray 1999; Kawaguchi et al. 2019).

  2. First simulate event times, then simulate event types: This scheme simulates from a proportional cause-specific hazards (PCSH) model. It is implemented in ‘survsim’ R package (Beyersmann et al. 2009; Moriña and Navarro 2017).



[2] 🎈Simulation Example

[2.1] 🎈🎈User-chosen parameters and covariates

rm(list=ls())
n = 100 # number of samples to be generated
J = 2   # number of competing events
X = matrix( rnorm(n*3), ncol=3, dimnames = list(c(),c("X1","X2","X3")) ) # cova-
        #riate matrix with 3 covariates X1, X2, X3
beta1 = c(log(2.5),log(2),-log(2)) # effect of the covariates on event 1
beta2 = c(0,0,-log(1.2)) # effect of the covariates on event 2

X[1:6,]
##               X1         X2           X3
## [1,]  0.70348073 -0.8579075 -0.001157996
## [2,] -0.36981807 -0.4065081 -1.123856052
## [3,]  0.23529715 -0.3591893  0.096325460
## [4,] -0.06014776  0.1165779 -1.181138932
## [5,]  1.80348941  1.0329221  0.440373011
## [6,] -0.66629612 -0.3824985 -0.618692255



[2.2] 🎈🎈CR data simulation using PSDH (limited to two causes)

# ⭕️ Load source 

source("source/simulateTwoCauseFineGrayModel1.R") # an extension of fastcmprsk::simulateTwoCauseFineGrayModel()
#library(fastcmprsk)

# arguments of simulateTwoCauseFineGrayModel1() function
#
# nobs = number of samples to be generated
# beta1 = effect of the covariates on event 1
# beta2 = effect of the covariates on event 2
# X = covariate matrix
# dist.cens = censoring distribution, can be "llogistic", "weibull", "lnorm", "unif"
# beta0.cens = shape/first parameter of the censoring distribution
# anc.cens = scale/second parameter of the censoring distribution
# p = prevalence when all X's are zero
# returnX = whether X should be returned 

# ⭕️ Simulate data with varying censoring distribution and prevalence 

dist.cens.vec = c("llogistic", "weibull", "lnorm", "unif") # varying censoring distribution
beta0.cens.vec = c(0, 0.1, 0.2, 0.3) # varying censoring distribution params
anc.cens.vec = c(1, 1.1, 1.2, 1.3) # varying censoring distribution params
prev_0.vec = c(0.05, 0.03, 0.01) # varying prevalence

data = list() # will hold 4x4x4x3 = 192 simulated datasets

for(i in seq(dist.cens.vec))
for(j in seq(beta0.cens.vec))
for(k in seq(anc.cens.vec))
for(l in seq(prev_0.vec))
{
    d = simulateTwoCauseFineGrayModel1(nobs = n, beta1 = beta1, beta2 = beta2, X = X, 
    dist.cens = dist.cens.vec[i], beta0.cens = beta0.cens.vec[j], anc.cens = anc.cens.vec[k],
    p = prev_0.vec[l], returnX = F)
    auxdata = data.frame( cause = d$fstatus, time = d$ftime )

    data[[paste(dist.cens.vec[i],beta0.cens.vec[j],anc.cens.vec[k],prev_0.vec[l],sep="_")]] = auxdata
}
    
length(data)

data[1]

# ⭕️ Investigate the effect of the parameters on the simulated data

# HOMEWORK :)



[2.3] 🎈🎈CR data simulation according to PCSH

# ⭕️ Load source 

library(survsim)


# ⭕️ Simulate data with varying event and censoring distribution

betaC = cbind(beta1,beta2)
beta = lapply( 1:nrow(betaC), function(i) betaC[i,] )  # do this to match the input format
maxtime = 20
dist.ev = c("weibull","weibull")  # "lnorm","llogistic"
anc.ev = c(2.5,1.5)
beta0.ev = c(1,1)
dist.cens.vec = c("llogistic", "weibull", "lnorm", "unif") # varying censoring distribution
beta0.cens.vec = c(0, 0.1, 0.2, 0.3) # varying censoring distribution params
anc.cens.vec = c(1, 1.1, 1.2, 1.3) # varying censoring distribution params

data = list() # will hold 4x4x4x3 = 192 simulated datasets

for(j in seq(dist.cens.vec))   # WILL TAKE ~15 mins to run
for(k in seq(beta0.cens.vec))
for(l in seq(anc.cens.vec))
{
    d = lapply( 1:n, function(ii)  # have to use this internal function for user-supplied X 
        crisk.ncens.sim( i = i, nsit = 2, foltime = maxtime, beta = beta, eff = X[i,], 
            dist.ev = dist.ev, anc.ev = anc.ev, beta0.ev = beta0.ev, 
            dist.cens = dist.cens.vec[j], beta0.cens = beta0.cens.vec[k], 
            anc.cens = anc.cens.vec[l] ) 
      )

    auxdata = do.call(rbind, d)
  auxdata$cause[auxdata$status == 0] = 0 #set censoring code as 0
  auxdata = auxdata[ , names(auxdata) %in% c("cause","time")]
  rownames(auxdata) = NULL
  
    data[[paste(dist.cens.vec[j],beta0.cens.vec[k],anc.cens.vec[l],sep="_")]] = auxdata
}

length(data)

data[1]


# ⭕️ Investigate the effect of the parameters on the simulated data

# HOMEWORK :)

References and other resources

  • 1999 - Fine & Gray - A proportional hazards model for the subdistribution of a competing risk
  • 2009 - Bayersmann et al. - Simulating competing risks data in survival analysis
  • 2017 - Morina & Navarro - Competing risks simulation with the survsim R package
  • 2019 - Kawaguchi et al. - A Fast and Scalable Implementation Method for Competing Risks Data with the R Package fastcmprsk