Skip to contents

A subsampling method based on A- / L- optimal for logistic regression proposed by Wang et.al. (2018).

Usage

OSMAC(X, Y, r1, r2, method = c("mmse", "mvc"), seed_1 = NULL, seed_2 = NULL)

Arguments

X

A data.frame or matrix of explanatory variables.

Y

A numeric vector. Response variable.

r1

Sample size for pilot sample.

r2

Subsample size.

method

Sampling methods:

  • mmse: A-optimal.

  • mvc: L-optimal.

seed_1

Random seed for the first stage sampling.

seed_2

Random seed for the second stage sampling.

Value

A numeric vector with length r2 which represent the subsample index.

References

HaiYing Wang, Rong Zhu and Ping Ma (2018) Optimal Subsampling for Large Sample Logistic Regression, Journal of the American Statistical Association, 113:522, 829-844, https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1292914, https://github.com/Ossifragus/OSMAC.

Examples

data <- data_binary_class
y <- data[["y"]]
x <- data[-which(names(data) == "y")]

OSMAC(X = x, Y = y, r1 = 100, r2 = 5, method="mmse", seed_1 = 123, seed_2 = 456)
#> [1] 5684 1620 5372 8297 8863
OSMAC(X = x, Y = y, r1 = 100, r2 = 5, method="mvc", seed_1 = 123, seed_2 = 456)
#> [1] 5813 1681 5372 8313 8863