--- title: "Validation of FastSurvival" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Validation of FastSurvival} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) have_survival <- requireNamespace("survival", quietly = TRUE) have_survRM2 <- requireNamespace("survRM2", quietly = TRUE) have_survAH <- requireNamespace("survAH", quietly = TRUE) have_nph <- requireNamespace("nph", quietly = TRUE) have_nphRCT <- requireNamespace("nphRCT", quietly = TRUE) ``` # Purpose FastSurvival is built for speed, but a fast estimator is only useful if it returns the same answer as the established implementation. This vignette checks the numerical agreement between each FastSurvival function and a reference from a well-known package: the survival package for the Kaplan-Meier estimate, the log-rank test, the Cox hazard ratio, and milestone survival; survRM2 for the restricted mean survival time; and survAH for the average hazard with survival weight. The weighted log-rank and max-combo tests are checked against the nph package, and the robust modestly-weighted test against the nphRCT package. The same comparisons form the basis of the automated test suite shipped with the package. ```{r load} library(FastSurvival) ``` # Kaplan-Meier survival `survfit_fast()` evaluates the Kaplan-Meier estimate at a single time point. We compare against `summary(survfit(...))` at the same time on the `ovarian` dataset. ```{r km, eval = have_survival} library(survival) ord <- order(ovarian$futime) t_s <- ovarian$futime[ord] e_s <- ovarian$fustat[ord] fast <- survfit_fast(t_s, e_s, t_eval = 500, conf.type = "log") fit <- survfit(Surv(futime, fustat) ~ 1, data = ovarian) ref <- summary(fit, times = 500) data.frame( quantity = c("survival", "std.err"), fast = c(unclass(fast)["surv"], unclass(fast)["std.err"]), survival = c(ref$surv, ref$std.err), row.names = NULL ) ``` # Log-rank test `survdiff_fast()` returns a two-sided chi-square statistic when `side = 2`, matching the statistic from `survdiff()`. ```{r logrank, eval = have_survival} fast_lr <- survdiff_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1, side = 2) ref_lr <- survdiff(Surv(futime, fustat) ~ rx, data = ovarian) c(fast = as.numeric(fast_lr), survival = ref_lr$chisq) ``` # Weighted log-rank test `survdiff_fast()` also computes Fleming-Harrington G(rho, gamma) weighted log-rank tests through the `weight = "fh"` argument. The [nph](https://cran.r-project.org/package=nph) package provides `logrank.test()` with the same family, so we compare the chi-square statistic across several weight choices on the `ovarian` data. ```{r wlr, eval = have_survival && have_nph} fh_grid <- data.frame(rho = c(0, 1, 0, 1), gamma = c(1, 0, 0, 1)) do.call(rbind, lapply(seq_len(nrow(fh_grid)), function(i) { r <- fh_grid$rho[i] g <- fh_grid$gamma[i] fast <- as.numeric(survdiff_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1, side = 2, weight = "fh", rho = r, gamma = g)) nph_chisq <- nph::logrank.test(ovarian$futime, ovarian$fustat, ovarian$rx, rho = r, gamma = g)$test$Chisq data.frame(rho = r, gamma = g, fast = fast, nph = nph_chisq) })) ``` The two implementations agree to numerical precision across the weight family, including the ordinary log-rank test recovered at `rho = 0, gamma = 0`. # Cox hazard ratio `coxph_fast()` returns the Pike-Halley Estimator, a closed-form approximation to the Cox partial likelihood maximizer. On data with a moderate hazard ratio the two agree closely. We report the log hazard ratio from both. ```{r cox, eval = have_survival} fast_hr <- coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1) ref_cox <- coxph(Surv(futime, fustat) ~ I(rx == 2), data = ovarian) c(fast = unclass(fast_hr)["coef"], cox = unname(coef(ref_cox))) ``` # Restricted mean survival time `rmst_fast()` integrates the Kaplan-Meier survival curve up to a horizon. We compare the two-group RMST difference against `survRM2::rmst2()`. ```{r rmst, eval = have_survival && have_survRM2} library(survRM2) arm <- as.integer(ovarian$rx == 2) tau <- 500 fast_rmst <- rmst_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1, tau = tau) ref_rmst <- rmst2(time = ovarian$futime, status = ovarian$fustat, arm = arm, tau = tau) c(fast = unclass(fast_rmst)["diff"], survRM2 = ref_rmst$unadjusted.result[1, 1]) ``` # Milestone survival `milestone_fast()` compares Kaplan-Meier survival between two groups at a milestone timepoint. The per-group survival probabilities match those from `survfit()`. ```{r milestone, eval = have_survival} tstar <- 500 fast_ms <- milestone_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1, tau = tstar, method = "wald") fit_g <- survfit(Surv(futime, fustat) ~ rx, data = ovarian) ref_g <- summary(fit_g, times = tstar) ms_surv <- fast_ms$surv data.frame( group = c("control", "treatment"), fast = c(ms_surv["control"], ms_surv["treatment"]), survival = ref_g$surv, row.names = NULL ) ``` # Average hazard with survival weight `ahsw_fast()` computes the average hazard with survival weight of Uno and Horiguchi. We compare the per-group average hazard against `survAH::ah2()`. ```{r ahsw, eval = have_survival && have_survAH} library(survAH) arm <- as.integer(ovarian$rx == 2) tau <- 500 fast_ah <- ahsw_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1, tau = tau) ref_ah <- ah2(time = ovarian$futime, status = ovarian$fustat, arm = arm, tau = tau) data.frame( quantity = c("AH (control)", "AH (treatment)"), fast = unclass(fast_ah)[c("ah.ctrl", "ah.trt")], survAH = c(ref_ah$ah["AH (arm0)", "Est."], ref_ah$ah["AH (arm1)", "Est."]), row.names = NULL ) ``` The average hazard is the ratio of the cumulative event probability to the restricted mean survival time, so for the `ovarian` data, where follow-up is measured in days, the values are on the order of 1e-03 per day in both groups. # Average hazard ratio `ahr_fast()` computes the Kalbfleisch-Prentice average hazard ratio between two groups over a restricted interval. The reference implementation is `ahrKM()` from the AHR package, but that package has been archived on CRAN, so here we check the point estimates against a direct survival-based computation: each group's Kaplan-Meier curve from `survfit()`, integrated to form the group shares of the total hazard. ```{r ahr, eval = have_survival} tau <- 500 fast_ahr <- ahr_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1, tau = tau) g <- ovarian$rx ev <- c(ovarian$futime[g == 1 & ovarian$fustat == 1], ovarian$futime[g == 2 & ovarian$fustat == 1]) grid <- sort(unique(c(0, ev[ev <= tau], tau))) km_on_grid <- function(gi) { sf <- survfit(Surv(futime, fustat) ~ 1, data = ovarian[g == gi, ]) approxfun(sf$time, sf$surv, method = "constant", yleft = 1, rule = 2, f = 0)(grid) } S1 <- km_on_grid(1) S2 <- km_on_grid(2) m <- length(grid) dS1 <- S1 - c(1, S1[-m]) GL <- S1[m] * S2[m] ref_theta1 <- -sum(S2 * dS1) / (1 - GL) data.frame( quantity = c("theta (group 1)", "theta (group 2)", "AHR"), fast = c(fast_ahr$theta[[1]], fast_ahr$theta[[2]], fast_ahr$ahr), reference = c(ref_theta1, 1 - ref_theta1, (1 - ref_theta1) / ref_theta1), row.names = NULL ) ``` The point estimates match the survival-based reference to numerical precision. The average hazard ratio was additionally cross-checked against the archived AHR package, the reference implementation used by Dormuth et al. (2024): the two group shares, the average hazard ratio, the variances, and both test statistics agree to within the order of 1e-15. That external comparison is reproducible with the `tools/verify_ahr_fast.R` script shipped in the package sources. # Max-combo test `maxcombo_fast()` computes the max-combo test, the most extreme of a set of Fleming-Harrington weighted log-rank statistics. The `nph` package provides `logrank.maxtest()`, whose default weight set is FH(0, 0), FH(0, 1), and FH(1, 0). We request the same three weights from `maxcombo_fast()` and compare the component Z-scores. The individual Z-scores are kept in the `z` attribute, signed so that a negative value indicates benefit for the treatment group under the package convention. Because `nph` orients the contrast in the opposite direction, the signs are mirrored, so we compare absolute values. ```{r maxcombo, eval = have_survival && have_nph} rho <- c(0, 0, 1) gamma <- c(0, 1, 0) fast_mc <- maxcombo_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1, side = 2, rho = rho, gamma = gamma) nph_mc <- nph::logrank.maxtest(ovarian$futime, ovarian$fustat, ovarian$rx) data.frame( weight = c("FH(0,0)", "FH(0,1)", "FH(1,0)"), fast = abs(attr(fast_mc, "z")), nph = abs(nph_mc$tests$z), row.names = NULL ) ``` The component statistics agree in absolute value. The final max-combo statistic is the largest of these, which for the two-sided test is `max(abs(Z))` and is therefore always non-negative. The p-values from the two packages are close but not identical, since they use different numerical methods for the joint distribution (a multivariate normal integral here, a Bonferroni-style combination in `nph`). # Robust modestly-weighted log-rank test `rmw_fast()` computes the robust modestly-weighted (rMW) test of Magirr and Öhrn, the maximum of the standard log-rank statistic and a single modestly-weighted log-rank statistic. Each component is a weighted log-rank test, so we check the two component Z-scores against [nphRCT](https://cran.r-project.org/package=nphRCT), the implementation used by the method's authors. The modestly-weighted component uses the survival-threshold parameterization, so `s_star = 1` recovers the standard log-rank test and `s_star = 0.5` gives the modestly-weighted component. As with the max-combo test the package orients a negative Z toward treatment benefit while `nphRCT` uses the opposite sign, so we compare absolute values. ```{r rmw, eval = have_survival && have_nphRCT} ov_df <- data.frame( time = ovarian$futime, event = ovarian$fustat, arm = factor(ifelse(ovarian$rx == 1, "control", "experimental"), levels = c("control", "experimental")) ) fit_rmw <- rmw_fast(ov_df$time, ov_df$event, ov_df$arm, control = "control", side = 1, s_star = 0.5) z_lr_nph <- nphRCT::wlrt(Surv(time, event) ~ arm, data = ov_df, method = "mw", s_star = 1)$z z_mw_nph <- nphRCT::wlrt(Surv(time, event) ~ arm, data = ov_df, method = "mw", s_star = 0.5)$z data.frame( component = c("log-rank (s_star = 1)", "modestly-weighted (s_star = 0.5)"), fast = abs(attr(fit_rmw, "z")), nphRCT = abs(c(z_lr_nph, z_mw_nph)), row.names = NULL ) ``` The component statistics agree in absolute value. The null correlation of the two components, reported in the `corr` attribute, matches the covariance computed by the authors' `find_cor` routine to numerical precision (0.98 on the `ovarian` data). The combined statistic and one-sided p-value then follow from the bivariate normal distribution of the two components, evaluated with `mvtnorm::pmvnorm`. On the POPLAR overall-survival data analyzed in the original article, `rmw_fast()` reproduces the published one-sided p-values of 0.0028 for the log-rank test, 0.0009 for the modestly-weighted test, and 0.0012 for the rMW test, with a component correlation of 0.97. # Summary Across all functions the FastSurvival results reproduce the reference values from the established packages. The point estimates and test statistics agree to numerical precision for the Kaplan-Meier, log-rank, RMST, milestone, and average-hazard quantities, the average hazard ratio matches a survival-based reference, and the closed-form Cox hazard ratio agrees with the partial-likelihood maximizer to the order expected for the Pike-Halley approximation. This agreement is verified continuously by the package test suite. # References Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. *Journal of the American Statistical Association*, 53(282), 457-481. Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. *Cancer Chemotherapy Reports*, 50(3), 163-170. Fleming, T. R., & Harrington, D. P. (1991). *Counting Processes and Survival Analysis*. New York: John Wiley & Sons. Cox, D. R. (1972). Regression models and life-tables. *Journal of the Royal Statistical Society. Series B (Methodological)*, 34(2), 187-220. Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. *Manuscript under review.* Royston, P., & Parmar, M. K. B. (2013). Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. *BMC Medical Research Methodology*, 13, 152. Uno, H., Claggett, B., Tian, L., et al. (2014). Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. *Journal of Clinical Oncology*, 32(22), 2380-2385. Tang, Y. (2021). Some new confidence intervals for Kaplan-Meier based estimators from one and two sample survival data. *Statistics in Medicine*, 40(23), 4961-4976. Uno, H., & Horiguchi, M. (2023). Ratio and difference of average hazard with survival weight: new measures to quantify survival benefit of new therapy. *Statistics in Medicine*, 42(7), 936-952. Kalbfleisch, J. D., & Prentice, R. L. (1981). Estimation of the average hazard ratio. *Biometrika*, 68(1), 105-112. Dormuth, I., Pauly, M., Rauch, G., & Herrmann, C. (2024). Sample size calculation under nonproportional hazards using average hazard ratios. *Biometrical Journal*, 66(6), e202300271. Karrison, T. G. (2016). Versatile tests for comparing survival curves based on weighted log-rank statistics. *The Stata Journal*, 16(3), 678-690. Lin, R. S., Lin, J., Roychoudhury, S., et al. (2020). Alternative analysis methods for time to event endpoints under nonproportional hazards: a comparative analysis. *Statistics in Biopharmaceutical Research*, 12(2), 187-198. Magirr, D., & Burman, C.-F. (2019). Modestly weighted logrank tests. *Statistics in Medicine*, 38(20), 3782-3790. Magirr, D., & Öhrn, F. (2026). Robust modestly weighted log-rank tests. *Pharmaceutical Statistics*, 25(1), e70066.