Package 'riskscores' reference manual

Title:	Optimized Integer Risk Score Models
Description:	Implements an optimized approach to learning risk score models, where sparsity and integer constraints are integrated into the model-fitting process.
Authors:	Hannah Eglinton [aut, cre], Alice Paul [aut, cph], Oscar Yan [aut], R Core Team [ctb, cph] (Copyright holder of Rinternals.h, R.h, lm.c, Applic.h, statsR.h, glm package), Robert Gentleman [ctb, cph] (Author and copyright holder of Rinternals.h), Ross Ihaka [ctb, cph] (Author and copyright holder of Rinternals.h), Simon Davies [ctb] (Author of glm.fit function (modified in cv_risk_mod.R)), Thomas Lumley [ctb] (Author of glm.fit function (modified in cv_risk_mod.R))
Maintainer:	Hannah Eglinton <[email protected]>
License:	GPL (>= 3)
Version:	1.1.1
Built:	2025-03-27 03:49:08 UTC
Source:	https://github.com/hjeglinton/riskscores

Breast tissue biopsy data

Description

The Breast Cancer Wisconsin dataset from the UCI machine learning repository records the measurements from breast tissue biopsies. The outcome of interest is whether the sample was benign or malignant.

Usage

breastcancer
breastcancer

Format

`breastcancer`

A data frame with 683 rows and 10 columns:

Benign: 1 for malignant, 0 for benign
ClumpThickness: Clump thickness on an integer scale from 1 to 10
UniformityOfCellSize: Uniformity of cell size on an integer scale from 1 to 10
UniformityofCellShape: Uniformity of cell shape on an integer scale from 1 to 10
MarginalAdhesion: Marginal adhesion on an integer scale from 1 to 10
SingleEpithelialCellSize: Single epithelial cell size on an integer scale from 1 to 10
BareNuclei: Bare nuclei on an integer scale from 1 to 10
BlandChromatin: Bland chromatin on an integer scale from 1 to 10
NormalNucleoli: Normal nucleoli on an integer scale from 1 to 10
Mitosis: Mitosis on an integer scale from 1 to 10

Source

https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original

Clip Values

Description

Clip values prior to exponentiation to avoid numeric errors.

Usage

clip_exp_vals(x)
clip_exp_vals(x)

Arguments

`x`	Numeric vector.

Value

Input vector x with all values between -709.78 and 709.78.

Examples

clip_exp_vals(710)
clip_exp_vals(710)

Extract Model Coefficients

Description

Extracts a vector of model coefficients (both nonzero and zero) from a "risk_mod" object. Equivalent to accessing the beta attribute of a "risk_mod" object.

Usage

## S3 method for class 'risk_mod'
coef(object, ...)
## S3 method for class 'risk_mod'
coef(object, ...)

Arguments

`object`	An object of class "risk_mod", usually a result of a call to `risk_mod()`.
`...`	Additional arguments.

Value

Numeric vector with coefficients.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y, lambda0 = 0.01)
coef(mod)

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y, lambda0 = 0.01)
coef(mod)

Run Cross-Validation to Tune Lambda0

Description

Runs k-fold cross-validation on a grid of $\lambda_0$ values. Records class accuracy and deviance for each $\lambda_0$ . Returns an object of class "cv_risk_mod".

Usage

cv_risk_mod(
  X,
  y,
  weights = NULL,
  beta = NULL,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  nlambda = 25,
  lambda_min_ratio = ifelse(nrow(X) < ncol(X), 0.01, 1e-04),
  lambda0 = NULL,
  nfolds = 10,
  foldids = NULL,
  parallel = FALSE,
  shuffle = TRUE,
  seed = NULL
)
cv_risk_mod(
  X,
  y,
  weights = NULL,
  beta = NULL,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  nlambda = 25,
  lambda_min_ratio = ifelse(nrow(X) < ncol(X), 0.01, 1e-04),
  lambda0 = NULL,
  nfolds = 10,
  foldids = NULL,
  parallel = FALSE,
  shuffle = TRUE,
  seed = NULL
)

Arguments

`X`	Input covariate matrix with dimension $n \times p$ ; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`weights`	Numeric vector of length $n$ with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`beta`	Starting numeric vector with $p$ coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.
`a`	Integer lower bound for coefficients (default: -10).
`b`	Integer upper bound for coefficients (default: 10).
`max_iters`	Maximum number of iterations (default: 100).
`tol`	Tolerance for convergence (default: 1e-5).
`nlambda`	Number of lambda values to try (default: 25).
`lambda_min_ratio`	Smallest value for lambda, as a fraction of lambda_max (the smallest value for which all coefficients are zero). The default depends on the sample size ( $n$ ) relative to the number of variables ( $p$ ). If $n > p$ , the default is 0.0001, close to zero. If $n < p$ , the default is 0.01.
`lambda0`	Optional sequence of lambda values. By default, the function will derive the lambda0 sequence based on the data (see `lambda_min_ratio`).
`nfolds`	Number of folds, implied if `foldids` provided (default: 10).
`foldids`	Optional vector of values between 1 and `nfolds`.
`parallel`	If `TRUE`, parallel processing (using foreach) is implemented during cross-validation to increase efficiency (default: `FALSE`). User must first register parallel backend with a function such as doParallel::registerDoParallel.
`shuffle`	Whether order of coefficients is shuffled during coordinate descent (default: TRUE).
`seed`	An integer that is used as argument by `set.seed()` for offsetting the random number generator. Default is to not set a particular randomization seed.

Value

An object of class "cv_risk_mod" with the following attributes:

`results`	Dataframe containing a summary of deviance and accuracy for each value of `lambda0` (mean and SD). Also includes the number of nonzero coefficients that are produced by each `lambda0` when fit on the full data.
`lambda_min`	Numeric value indicating the `lambda0` that resulted in the lowest mean deviance.
`lambda_1se`	Numeric value indicating the largest `lamdba0` that had a mean deviance within one standard error of `lambda_min`.

Run Cross-Validation to Tune Lambda0 with Random Start

Description

Runs k-fold cross-validation on a grid of $\lambda_0$ values using random warm starts (see risk_mod_random_start. Records class accuracy and deviance for each $\lambda_0$ . Returns an object of class "cv_risk_mod".

Usage

cv_risk_mod_random_start(
  X,
  y,
  weights = NULL,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  nlambda = 25,
  lambda_min_ratio = ifelse(nrow(X) < ncol(X), 0.01, 1e-04),
  lambda0 = NULL,
  nfolds = 10,
  foldids = NULL,
  parallel = FALSE,
  seed = NULL,
  nstart = 5
)
cv_risk_mod_random_start(
  X,
  y,
  weights = NULL,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  nlambda = 25,
  lambda_min_ratio = ifelse(nrow(X) < ncol(X), 0.01, 1e-04),
  lambda0 = NULL,
  nfolds = 10,
  foldids = NULL,
  parallel = FALSE,
  seed = NULL,
  nstart = 5
)

Arguments

`X`	Input covariate matrix with dimension $n \times p$ ; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`weights`	Numeric vector of length $n$ with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`a`	Integer lower bound for coefficients (default: -10).
`b`	Integer upper bound for coefficients (default: 10).
`max_iters`	Maximum number of iterations (default: 100).
`tol`	Tolerance for convergence (default: 1e-5).
`nlambda`	Number of lambda values to try (default: 25).
`lambda_min_ratio`	Smallest value for lambda, as a fraction of lambda_max (the smallest value for which all coefficients are zero). The default depends on the sample size ( $n$ ) relative to the number of variables ( $p$ ). If $n > p$ , the default is 0.0001, close to zero. If $n < p$ , the default is 0.01.
`lambda0`	Optional sequence of lambda values. By default, the function will derive the lambda0 sequence based on the data (see `lambda_min_ratio`).
`nfolds`	Number of folds, implied if `foldids` provided (default: 10).
`foldids`	Optional vector of values between 1 and `nfolds`.
`parallel`	If `TRUE`, parallel processing (using foreach) is implemented during cross-validation to increase efficiency (default: `FALSE`). User must first register parallel backend with a function such as doParallel::registerDoParallel.
`seed`	An integer that is used as argument by `set.seed()` for offsetting the random number generator. Default is to not set a particular randomization seed.
`nstart`	Number of different random starts to try (default: 5).

Get Model Metrics

Description

Calculates a risk model's accuracy, sensitivity, and specificity given a set of data.

Usage

get_metrics(
  mod,
  X = NULL,
  y = NULL,
  weights = NULL,
  threshold = NULL,
  threshold_type = c("response", "score")
)
get_metrics(
  mod,
  X = NULL,
  y = NULL,
  weights = NULL,
  threshold = NULL,
  threshold_type = c("response", "score")
)

Arguments

`mod`	An object of class `risk_mod`, usually a result of a call to `risk_mod()`.
`X`	Input covariate matrix with dimension $n \times p$ ; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`weights`	Numeric vector of length $n$ with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`threshold`	Numeric vector of classification threshold values used to calculate the accuracy, sensitivity, and specificity of the model. Defaults to a range of risk probability thresholds from 0.1 to 0.9 by 0.1.
`threshold_type`	Defines whether the `threshold` vector contains risk probability values ("response") or threshold values expressed as scores from the risk score model ("score"). Default: "response".

Value

Data frame with accuracy, sensitivity, and specificity for each threshold.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y)
get_metrics(mod, X, y)

get_metrics(mod, X, y, threshold = c(150, 175, 200), threshold_type = "score")
y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y)
get_metrics(mod, X, y)

get_metrics(mod, X, y, threshold = c(150, 175, 200), threshold_type = "score")

Get Model Metrics for a Single Threshold

Description

Calculates a risk model's deviance, accuracy, sensitivity, and specificity given a set of data and a threshold value.

Usage

get_metrics_internal(
  mod,
  X = NULL,
  y = NULL,
  weights = NULL,
  threshold = 0.5,
  threshold_type = c("response", "score")
)
get_metrics_internal(
  mod,
  X = NULL,
  y = NULL,
  weights = NULL,
  threshold = 0.5,
  threshold_type = c("response", "score")
)

Arguments

`mod`	An object of class `risk_mod`, usually a result of a call to `risk_mod()`.
`X`	Input covariate matrix with dimension $n \times p$ ; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`weights`	Numeric vector of length $n$ with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`threshold`	Numeric vector of classification threshold values used to calculate the accuracy, sensitivity, and specificity of the model. Defaults to a range of risk probability thresholds from 0.1 to 0.9 by 0.1.
`threshold_type`	Defines whether the `threshold` vector contains risk probability values ("response") or threshold values expressed as scores from the risk score model ("score"). Default: "response".

Value

List with deviance (dev), accuracy (acc), sensitivity (sens), and specificity (spec).

Calculate Risk Probability from Score

Description

Returns the risk probabilities for the provided score value(s).

Usage

get_risk(object, score)
get_risk(object, score)

Arguments

`object`	An object of class "risk_mod", usually a result of a call to `risk_mod()`.
`score`	Numeric vector with score value(s).

Value

Numeric vector with the same length as score.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y)
get_risk(mod, score = c(1, 10, 20))

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y)
get_risk(mod, score = c(1, 10, 20))

Calculate Score from Risk Probability

Description

Returns the score(s) for the provided risk probabilities.

Usage

get_score(object, risk)
get_score(object, risk)

Arguments

`object`	An object of class "risk_mod", usually a result of a call to `risk_mod()`.
`risk`	Numeric vector with probability value(s).

Value

Numeric vector with the same length as risk.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y)
get_score(mod, risk = c(0.25, 0.50, 0.75))

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y)
get_score(mod, risk = c(0.25, 0.50, 0.75))

Plot Risk Score Cross-Validation Results

Description

Plots the mean deviance for each $lambda_0$ tested during cross-validation.

Usage

## S3 method for class 'cv_risk_mod'
plot(x, ...)
## S3 method for class 'cv_risk_mod'
plot(x, ...)

Arguments

`x`	An object of class "cv_risk_mod", usually a result of a call to `cv_risk_mod()`.
`...`	Additional arguments affecting the plot produced

Value

Object of class "ggplot".

Plot Risk Score Model Curve

Description

Plots the linear regression equation associated with the integer risk score model. Plots the scores on the x-axis and risk on the y-axis.

Usage

## S3 method for class 'risk_mod'
plot(x, score_min = NULL, score_max = NULL, ...)
## S3 method for class 'risk_mod'
plot(x, score_min = NULL, score_max = NULL, ...)

Arguments

`x`	An object of class "risk_mod", usually a result of a call to `risk_mod()`.
`score_min`	The minimum score displayed on the x-axis. The default is the minimum score predicted from model's training data.
`score_max`	The maximum score displayed on the x-axis. The default is the maximum score predicted from model's training data.
`...`	Additional arguments affecting the plot produced

Value

Object of class "ggplot".

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])
mod <- risk_mod(X, y, lambda0 = 0.01)

plot(mod)
y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])
mod <- risk_mod(X, y, lambda0 = 0.01)

plot(mod)

Predict Method for Risk Model Fits

Description

Obtains predictions from risk score models.

Usage

## S3 method for class 'risk_mod'
predict(object, newx = NULL, type = c("link", "response", "score"), ...)
## S3 method for class 'risk_mod'
predict(object, newx = NULL, type = c("link", "response", "score"), ...)

Arguments

`object`	An object of class "risk_mod", usually a result of a call to `risk_mod()`.
`newx`	Optional matrix of new values for `X` for which predictions are to be made. If ommited, the fitted values are used.
`type`	The type of prediction required. The default ("link") is on the scale of the predictors (i.e. log-odds); the "response" type is on the scale of the response variable (i.e. risk probabilities); the "score" type returns the risk score calculated from the integer model.
`...`	Additional arguments.

Value

Numeric vector of predicted values.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])
mod <- risk_mod(X, y, lambda0 = 0.01)
predict(mod, type = "link")[1]
predict(mod, type = "response")[1]
predict(mod, type = "score")[1]
y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])
mod <- risk_mod(X, y, lambda0 = 0.01)
predict(mod, type = "link")[1]
predict(mod, type = "response")[1]
predict(mod, type = "score")[1]

Fit an Integer Risk Score Model

Description

Fits an optimized integer risk score model using a cyclical coordinate descent algorithm. Returns an object of class "risk_mod".

Usage

risk_mod(
  X,
  y,
  gamma = NULL,
  beta = NULL,
  weights = NULL,
  lambda0 = 0,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  shuffle = TRUE,
  seed = NULL
)
risk_mod(
  X,
  y,
  gamma = NULL,
  beta = NULL,
  weights = NULL,
  lambda0 = 0,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  shuffle = TRUE,
  seed = NULL
)

Arguments

`X`	Input covariate matrix with dimension $n \times p$ ; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`gamma`	Starting value to rescale coefficients for prediction (optional).
`beta`	Starting numeric vector with $p$ coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.
`weights`	Numeric vector of length $n$ with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`lambda0`	Penalty coefficient for L0 term (default: 0). See `cv_risk_mod()` for `lambda0` tuning.
`a`	Integer lower bound for coefficients (default: -10).
`b`	Integer upper bound for coefficients (default: 10).
`max_iters`	Maximum number of iterations (default: 100).
`tol`	Tolerance for convergence (default: 1e-5).
`shuffle`	Whether order of coefficients is shuffled during coordinate descent (default: TRUE).
`seed`	An integer that is used as argument by `set.seed()` for offsetting the random number generator. Default is to not set a particular randomization seed.

Details

This function uses a cyclical coordinate descent algorithm to solve the following optimization problem.

$\min_{\alpha,\beta} \quad \frac{1}{n} \sum_{i=1}^{n} (\gamma y_i x_i^T \beta - log(1 + exp(\gamma x_i^T \beta))) + \lambda_0 \sum_{j=1}^{p} 1(\beta_{j} \neq 0)$

$l \le \beta_j \le u \; \; \; \forall j = 1,2,...,p$

$\beta_j \in \mathbb{Z} \; \; \; \forall j = 1,2,...,p$

$\beta_0, \gamma \in \mathbb{R}$

These constraints ensure that the model will be sparse and include only integer coefficients.

Value

An object of class "risk_mod" with the following attributes:

`gamma`	Final scalar value.
`beta`	Vector of integer coefficients.
`glm_mod`	Logistic regression object of class "glm" (see stats::glm).
`X`	Input covariate matrix.
`y`	Input response vector.
`weights`	Input weights.
`lambda0`	Imput `lambda0` value.
`model_card`	Dataframe displaying the nonzero integer coefficients (i.e. "points") of the risk score model.
`score_map`	Dataframe containing a column of possible scores and a column with each score's associated risk probability.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod1 <- risk_mod(X, y)
mod1$model_card

mod2 <- risk_mod(X, y, lambda0 = 0.01)
mod2$model_card

mod3 <- risk_mod(X, y, lambda0 = 0.01, a = -5, b = 5)
mod3$model_card
y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod1 <- risk_mod(X, y)
mod1$model_card

mod2 <- risk_mod(X, y, lambda0 = 0.01)
mod2$model_card

mod3 <- risk_mod(X, y, lambda0 = 0.01, a = -5, b = 5)
mod3$model_card

Run risk model with random start

Description

Runs nstart iterations of risk_mod(), each with a different warm start, and selects the best model. Each coefficient start is randomly selected as -1, 0, or 1.

Usage

risk_mod_random_start(
  X,
  y,
  weights = NULL,
  lambda0 = 0,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  seed = NULL,
  nstart = 5
)
risk_mod_random_start(
  X,
  y,
  weights = NULL,
  lambda0 = 0,
  a = -10,
  b = 10,
  max_iters = 100,
  tol = 1e-05,
  seed = NULL,
  nstart = 5
)

Arguments

`X`	Input covariate matrix with dimension $n \times p$ ; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`weights`	Numeric vector of length $n$ with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`lambda0`	Penalty coefficient for L0 term (default: 0). See `cv_risk_mod()` for `lambda0` tuning.
`a`	Integer lower bound for coefficients (default: -10).
`b`	Integer upper bound for coefficients (default: 10).
`max_iters`	Maximum number of iterations (default: 100).
`tol`	Tolerance for convergence (default: 1e-5).
`seed`	An integer that is used as argument by `set.seed()` for offsetting the random number generator. Default is to not set a particular randomization seed.
`nstart`	Number of different random starts to try (default: 5).

Generate Stratified Fold IDs

Description

Returns a vector of fold IDs that preserves class proportions.

Usage

stratify_folds(y, nfolds = 10, seed = NULL)
stratify_folds(y, nfolds = 10, seed = NULL)

Arguments

`y`	Numeric vector for the (binomial) response variable.
`nfolds`	Number of folds (default: 10).
`seed`	An integer that is used as argument by `set.seed()` for offsetting the random number generator. Default is to not set a particular randomization seed.

Value

Numeric vector with the same length as y.

Examples

y <- rbinom(100, 1, 0.3)
foldids <- stratify_folds(y, nfolds = 5)
table(y, foldids)
y <- rbinom(100, 1, 0.3)
foldids <- stratify_folds(y, nfolds = 5)
table(y, foldids)

Summarize Risk Model Fit

Description

Prints text that summarizes "risk_mod" objects.

Usage

## S3 method for class 'risk_mod'
summary(object, ...)
## S3 method for class 'risk_mod'
summary(object, ...)

Arguments

`object`	An object of class "risk_mod", usually a result of a call to `risk_mod()`.
`...`	Additional arguments affecting the summary produced.

Value

Printed text with intercept, nonzero coefficients, gamma, lambda, and deviance

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y, lambda0 = 0.01)
summary(mod)

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod <- risk_mod(X, y, lambda0 = 0.01)
summary(mod)

Package 'riskscores'

Help Index

Breast tissue biopsy data

Description

Usage

Format

breastcancer

Source

Clip Values

Description

Usage

Arguments

Value

Examples

Extract Model Coefficients

Description

Usage

Arguments

Value

Examples

Run Cross-Validation to Tune Lambda0

Description

Usage

Arguments

Value

Run Cross-Validation to Tune Lambda0 with Random Start

Description

Usage

Arguments

Get Model Metrics

Description

Usage

Arguments

Value

Examples

Get Model Metrics for a Single Threshold

Description

Usage

Arguments

Value

Calculate Risk Probability from Score

Description

Usage

Arguments

Value

Examples

Calculate Score from Risk Probability

Description

Usage

Arguments

Value

Examples

Plot Risk Score Cross-Validation Results

Description

Usage

Arguments

Value

Plot Risk Score Model Curve

Description

Usage

Arguments

Value

Examples

Predict Method for Risk Model Fits

Description

Usage

Arguments

Value

Examples

Fit an Integer Risk Score Model

Description

Usage

Arguments

Details

Value

Examples

Run risk model with random start

Description

Usage

Arguments

`breastcancer`