synthcity.plugins.core.models.survival_analysis.third_party.metrics module

brier_score(survival_train: numpy.ndarray, survival_test: numpy.ndarray, estimate: numpy.ndarray, times: numpy.ndarray) numpy.ndarray

Estimate the time-dependent Brier score for right censored data.

The time-dependent Brier score is the mean squared error at time point \(t\):

\[\mathrm{BS}^c(t) = \frac{1}{n} \sum_{i=1}^n I(y_i \leq t \land \delta_i = 1) \frac{(0 - \hat{\pi}(t | \mathbf{x}_i))^2}{\hat{G}(y_i)} + I(y_i > t) \frac{(1 - \hat{\pi}(t | \mathbf{x}_i))^2}{\hat{G}(t)} ,\]

where \(\hat{\pi}(t | \mathbf{x})\) is the predicted probability of remaining event-free up to time point \(t\) for a feature vector \(\mathbf{x}\), and \(1/\hat{G}(t)\) is a inverse probability of censoring weight, estimated by the Kaplan-Meier estimator.

See the User Guide and 1 for details.

Parameters
  • survival_train (structured array, shape = (n_train_samples,)) – Survival times for training data to estimate the censoring distribution from. A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

  • survival_test (structured array, shape = (n_samples,)) – Survival times of test data. A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

  • estimate (array-like, shape = (n_samples, n_times)) – Estimated risk of experiencing an event for test data at times. The i-th column must contain the estimated probability of remaining event-free up to the i-th time point.

  • times (array-like, shape = (n_times,)) – The time points for which to estimate the Brier score. Values must be within the range of follow-up times of the test data survival_test.

Returns

brier_scores – Values of the brier score.

Return type

array , shape = (n_times,)

Examples

>>> from sksurv.datasets import load_gbsg2
>>> from sksurv.linear_model import CoxPHSurvivalAnalysis
>>> from sksurv.metrics import brier_score
>>> from sksurv.preprocessing import OneHotEncoder

Load and prepare data.

>>> X, y = load_gbsg2()
>>> X.loc[:, "tgrade"] = X.loc[:, "tgrade"].map(len).astype(int)
>>> Xt = OneHotEncoder().fit_transform(X)

Fit a Cox model.

>>> est = CoxPHSurvivalAnalysis(ties="efron").fit(Xt, y)

Retrieve individual survival functions and get probability of remaining event free up to 5 years (=1825 days).

>>> survs = est.predict_survival_function(Xt)
>>> preds = [fn(1825) for fn in survs]

Compute the Brier score at 5 years.

>>> times, score = brier_score(y, y, preds, 1825)
>>> print(score)
[0.20881843]

See also

integrated_brier_score

Computes the average Brier score over all time points.

References

1(1,2,3)

E. Graf, C. Schmoor, W. Sauerbrei, and M. Schumacher, “Assessment and comparison of prognostic classification schemes for survival data,” Statistics in Medicine, vol. 18, no. 17-18, pp. 2529–2545, 1999.

concordance_index_censored(event_indicator: numpy.ndarray, event_time: numpy.ndarray, estimate: numpy.ndarray, tied_tol: float = 1e-08) float

Concordance index for right-censored data

The concordance index is defined as the proportion of all comparable pairs in which the predictions and outcomes are concordant.

Two samples are comparable if (i) both of them experienced an event (at different times), or (ii) the one with a shorter observed survival time experienced an event, in which case the event-free subject “outlived” the other. A pair is not comparable if they experienced events at the same time.

Concordance intuitively means that two samples were ordered correctly by the model. More specifically, two samples are concordant, if the one with a higher estimated risk score has a shorter actual survival time. When predicted risks are identical for a pair, 0.5 rather than 1 is added to the count of concordant pairs.

See the User Guide and 1 for further description.

Parameters
  • event_indicator (array-like, shape = (n_samples,)) – Boolean array denotes whether an event occurred

  • event_time (array-like, shape = (n_samples,)) – Array containing the time of an event or time of censoring

  • estimate (array-like, shape = (n_samples,)) – Estimated risk of experiencing an event

  • tied_tol (float, optional, default: 1e-8) – The tolerance value for considering ties. If the absolute difference between risk scores is smaller or equal than tied_tol, risk scores are considered tied.

Returns

cindex – Concordance index

Return type

float

concordance_index_ipcw(survival_train: numpy.ndarray, survival_test: numpy.ndarray, estimate: numpy.ndarray, tau: Optional[float] = None, tied_tol: float = 1e-08) float

Concordance index for right-censored data based on inverse probability of censoring weights.

This is an alternative to the estimator in concordance_index_censored() that does not depend on the distribution of censoring times in the test data. Therefore, the estimate is unbiased and consistent for a population concordance measure that is free of censoring.

It is based on inverse probability of censoring weights, thus requires access to survival times from the training data to estimate the censoring distribution. Note that this requires that survival times survival_test lie within the range of survival times survival_train. This can be achieved by specifying the truncation time tau. The resulting cindex tells how well the given prediction model works in predicting events that occur in the time range from 0 to tau.

The estimator uses the Kaplan-Meier estimator to estimate the censoring survivor function. Therefore, it is restricted to situations where the random censoring assumption holds and censoring is independent of the features.

See the User Guide and 1 for further description.

Parameters
  • survival_train (structured array, shape = (n_train_samples,)) – Survival times for training data to estimate the censoring distribution from. A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

  • survival_test (structured array, shape = (n_samples,)) – Survival times of test data. A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

  • estimate (array-like, shape = (n_samples,)) – Estimated risk of experiencing an event of test data.

  • tau (float, optional) – Truncation time. The survival function for the underlying censoring time distribution \(D\) needs to be positive at tau, i.e., tau should be chosen such that the probability of being censored after time tau is non-zero: \(P(D > \tau) > 0\). If None, no truncation is performed.

  • tied_tol (float, optional, default: 1e-8) – The tolerance value for considering ties. If the absolute difference between risk scores is smaller or equal than tied_tol, risk scores are considered tied.

Returns

cindex – Concordance index

Return type

float