--- title: "Speed comparison" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Speed comparison} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` # Purpose FastSurvival is designed for repeated evaluation inside large simulation loops. This vignette shows how to benchmark each function against its standard counterpart and reports representative results. The benchmark code is shown but not executed when the vignette is built, because timing many microbenchmark replicates would exceed the build-time limits. To reproduce the numbers, run the code blocks interactively. The reported figures are median times from microbenchmark replicates on a single desktop machine. Absolute timings depend on hardware, sample size, and event rate, so the ratios matter more than the raw values. ```{r load} library(FastSurvival) library(survival) library(microbenchmark) # Comparison packages used in the benchmarks below library(survRM2) library(nph) ``` # Setup The key to the speed gain is that the analysis functions accept pre-sorted vectors. Inside a simulation loop the data are sorted once and reused, so the sort cost is paid a single time rather than on every call. We build a single phase-3-sized dataset for the benchmarks. ```{r data} set.seed(1) n <- 600 tt <- rexp(n, rate = 0.05) ev <- rbinom(n, 1, 0.8) gp <- rep(1:2, each = n / 2) ord <- order(tt) t_s <- tt[ord] e_s <- ev[ord] g_s <- gp[ord] ``` # survfit_fast vs survfit + summary ```{r bench-survfit} microbenchmark( fast = survfit_fast(t_s, e_s, t_eval = 20, presorted = TRUE), base = summary(survfit(Surv(tt, ev) ~ 1), times = 20), times = 1000 ) ``` # survdiff_fast vs survdiff ```{r bench-survdiff} microbenchmark( fast = survdiff_fast(t_s, e_s, g_s, control = 1, side = 2, presorted = TRUE), base = survdiff(Surv(tt, ev) ~ gp), times = 1000 ) ``` # coxph_fast vs coxph ```{r bench-coxph} microbenchmark( fast = coxph_fast(t_s, e_s, g_s, control = 1, presorted = TRUE), base = coxph(Surv(tt, ev) ~ I(gp == 2)), times = 1000 ) ``` # rmst_fast vs survRM2::rmst2 ```{r bench-rmst} arm <- as.integer(gp == 2) microbenchmark( fast = rmst_fast(t_s, e_s, g_s, control = 1, tau = 20, presorted = TRUE), base = survRM2::rmst2(time = tt, status = ev, arm = arm, tau = 20), times = 1000 ) ``` # survdiff_fast(weight = "fh") vs nph::logrank.test ```{r bench-wlr} microbenchmark( fast = survdiff_fast(t_s, e_s, g_s, control = 1, side = 2, weight = "fh", rho = 0, gamma = 1, presorted = TRUE), base = nph::logrank.test(tt, ev, gp, rho = 0, gamma = 1), times = 1000 ) ``` # ahr_fast vs AHR::ahrKM The Kalbfleisch-Prentice average hazard ratio is benchmarked against `ahrKM()` from the AHR package, the reference implementation used by Dormuth et al. (2024). Because AHR has been archived on CRAN, this benchmark is shown as a static block rather than a live chunk, so the vignette carries no undeclared dependency. Install AHR with `remotes::install_github("cran/AHR")` and run the block to reproduce it. ```r library(AHR) dat <- data.frame(tt = tt, ev = ev, gp = gp) microbenchmark( fast = ahr_fast(t_s, e_s, g_s, control = 1, tau = 20, presorted = TRUE), base = AHR::ahrKM(20, Surv(tt, ev) ~ gp, dat), times = 1000 ) ``` # Representative results The table below summarizes representative median timings on a typical phase-3 dataset (n = 600, event rate 80%) with `presorted = TRUE`. The exact values will differ on your machine, but the order of magnitude of the speedup is stable. | Function | Replaces | Approximate speed gain | |----------|----------|------------------------| | `survfit_fast()` | `survfit()` + `summary()` at one time point | ~70x | | `survdiff_fast()` | `survdiff()` | ~50x | | `coxph_fast()` | `coxph()` (point estimate + Wald CI) | ~50x | | `rmst_fast()` | `survRM2::rmst2()` | ~60x | | `survdiff_fast(weight = "fh")` | `nph::logrank.test()` | ~500x | | `ahr_fast()` | `AHR::ahrKM()` | ~130x | # Why it is faster Each function avoids the overhead that the standard implementations incur on every call. The standard functions parse a formula, build an S3 model object, and construct intermediate vectors before producing the result, which is appropriate for interactive use but wasteful when the same operation is repeated thousands of times. The FastSurvival functions take plain vectors, do the core computation in a single C++ pass over the data, and return a lightweight numeric vector. When the input is already sorted the sort cost is avoided entirely. In a simulation loop these savings accumulate across every iteration. # References Kalbfleisch, J. D., & Prentice, R. L. (1981). Estimation of the average hazard ratio. *Biometrika*, 68(1), 105-112. Dormuth, I., Pauly, M., Rauch, G., & Herrmann, C. (2024). Sample size calculation under nonproportional hazards using average hazard ratios. *Biometrical Journal*, 66(6), e202300271.