Subsampling Based on Leverage Scores — Leverage • dbsubsampling

A subsampling methods based on leverage scores proposed by Ma et.al. (2015).

Usage

Leverage(X, n, shrinkage_alpha = 1, replace = TRUE, seed = NULL)

Arguments

X: A data.frame or matrix of explanatory variables.
n: Subsample size.
shrinkage_alpha: Shrinkage for SLEV, default to 1 (do not shrinkage).
replace: With replacement or without replacement, default to TRUE.
seed: Random seed for the sampling.

Value

Subsample index.

References

Ping Ma, Michael W. Mahoney & Bin Yu (2015) A Statistical Perspective on Algorithmic Leveraging, Journal of Machine Learning Research, 16:27, 861−911, https://jmlr.csail.mit.edu/papers/v16/ma15a.html.

Examples

data <- data_numeric_regression
X <- data[-which(names(data) == "y")]
Leverage(X, n = 100, shrinkage_alpha = 0.9, replace = TRUE, seed = NULL)
#>   [1] 7832 9572 9449 8140 4219   13 1633 4363 4384 7691 1287 6985 3433 8341 1439
#>  [16] 7331 2455 4514 5427  145 1404 1688 2737 7643  295   40 6636 6488 6886 6970
#>  [31] 4710 3102 5731 1490 5825 2104 6841 8521 2040 8216 1950 1857 9049 5673 1659
#>  [46] 4878 9147 7663 7100 1116 4946  396  583 9024 3189 4994 8249 8009 2135 6287
#>  [61] 8985 4234 8841 3439 8805 8841 5666 2354 6163 4302 8139 8360 2518 6798   67
#>  [76] 9302 7260 2172 2584 1882 6943 8431 6800 2404 1049 9303 4247 6289 4385 6192
#>  [91] 6121 3386 4443 6626 8517 6828 2541 3076 4328 2288