Title: | Monotonic Optimal Binning |
---|---|
Description: | Generate the monotonic binning and perform the woe (weight of evidence) transformation for the logistic regression used in the consumer credit scorecard development. The woe transformation is a piecewise transformation that is linear to the log odds. For a numeric variable, all of its monotonic functional transformations will converge to the same woe transformation. |
Authors: | WenSui Liu |
Maintainer: | WenSui Liu <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.4.2 |
Built: | 2025-02-19 02:57:46 UTC |
Source: | https://github.com/cran/mob |
The function arb_bin
implements the monotonic binning based on
the decision tree.
arb_bin(x, y)
arb_bin(x, y)
x |
A numeric vector |
y |
A numeric vector with 0/1 binary values |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) arb_bin(hmeq$DEROG, hmeq$BAD)
data(hmeq) arb_bin(hmeq$DEROG, hmeq$BAD)
The function bad_bin
implements the quantile-based monotonic binning
by the iterative discretization based on cases with Y = 1.
bad_bin(x, y)
bad_bin(x, y)
x |
A numeric vector |
y |
A numeric vector with 0/1 binary values |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) bad_bin(hmeq$DEROG, hmeq$BAD)
data(hmeq) bad_bin(hmeq$DEROG, hmeq$BAD)
The function batch_bin
applies multiple binning algorithms in
batch to each vector in the dataframe.
batch_bin(y, xs, method = 1)
batch_bin(y, xs, method = 1)
y |
A numeric vector with 0/1 binary values. |
xs |
A dataframe with numeric vectors to discretize. |
method |
A integer from 1 to 7 referring to implementations below: 1. Implementation of iso_bin() 2. Implementation of qtl_bin() 3. Implementation of bad_bin() 4. Implementation of rng_bin() 5. Implementation of gbm_bin() 6. Implementation of kmn_bin() 7. Implementation of arb_bin() |
A list of binning outcomes with 2 dataframes: bin_sum: A dataframe of binning summary. bin_out: A list of binning output from binning functions, e.g. qtl_bin().
data(hmeq) batch_bin(hmeq$BAD, hmeq[, c('DEROG', 'DELINQ')])
data(hmeq) batch_bin(hmeq$BAD, hmeq[, c('DEROG', 'DELINQ')])
The function batch_woe
applies WoE transformations to vectors
in the dataframe.
batch_woe(xs, bin_out)
batch_woe(xs, bin_out)
xs |
A dataframe with numeric vectors to discretize. |
bin_out |
A binning output from the function batch_bin(). |
A dataframe with identical headers as the input xs. However, values of each variable have been transformed to WoE values.
data(hmeq) bin_out <- batch_bin(hmeq$BAD, hmeq[, c('DEROG', 'DELINQ')])$bin_out head(batch_woe(hmeq[, c('DEROG', 'DELINQ')], bin_out))
data(hmeq) bin_out <- batch_bin(hmeq$BAD, hmeq[, c('DEROG', 'DELINQ')])$bin_out head(batch_woe(hmeq[, c('DEROG', 'DELINQ')], bin_out))
The function cal_woe
applies the WoE transformation to a numeric
vector based on the binning outcome from a binning function, e.g. qtl_bin()
or iso_bin().
cal_woe(x, bin)
cal_woe(x, bin)
x |
A numeric vector that will be transformed to WoE values. |
bin |
A list with the binning outcome from the binning function, e.g. qtl_bin() or iso_bin() |
A numeric vector with WoE transformed values.
data(hmeq) bin_out <- qtl_bin(hmeq$DEROG, hmeq$BAD) cal_woe(hmeq$DEROG[1:10], bin_out)
data(hmeq) bin_out <- qtl_bin(hmeq$DEROG, hmeq$BAD) cal_woe(hmeq$DEROG[1:10], bin_out)
The function gbm_bin
implements the monotonic binning based on
the generalized boosted model (GBM).
gbm_bin(x, y)
gbm_bin(x, y)
x |
A numeric vector |
y |
A numeric vector with 0/1 binary values |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) gbm_bin(hmeq$DEROG, hmeq$BAD)
data(hmeq) gbm_bin(hmeq$DEROG, hmeq$BAD)
A dataset containing characteristics and delinquency information for 5,960 home equity loans.
hmeq
hmeq
A data frame with 5960 rows and 13 variables:
indicator of applicant defaulted on loan or seriously delinquent
Amount of the loan request, in dollar
Amount due on existing mortgage, in dollar
Value of current property, in dollar
DebtCon = debt consolidation; HomeImp = home improvement
Occupational categories
Years at present job
Number of major derogatory reports
Number of delinquent credit lines
Age of oldest credit line in months
Number of recent credit inquiries
Number of credit lines
Debt-to-income ratio
http://www.creditriskanalytics.net/datasets-private2.html
The function iso_bin
implements the monotonic binning based on
the isotonic regression.
iso_bin(x, y)
iso_bin(x, y)
x |
A numeric vector |
y |
A numeric vector with 0/1 binary values |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) iso_bin(hmeq$DEROG, hmeq$BAD)
data(hmeq) iso_bin(hmeq$DEROG, hmeq$BAD)
The function kmn_bin
implements the monotonic binning based on
the k-means clustering
kmn_bin(x, y)
kmn_bin(x, y)
x |
A numeric vector |
y |
A numeric vector with 0/1 binary values |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) kmn_bin(hmeq$DEROG, hmeq$BAD)
data(hmeq) kmn_bin(hmeq$DEROG, hmeq$BAD)
The function pool_bin
implements the monotonic binning for the pool data
based on the generalized boosted model (GBM).
pool_bin(x, num, den, log = FALSE)
pool_bin(x, num, den, log = FALSE)
x |
A numeric vector |
num |
A numeric vector with integer values for numerators to calculate bad rates |
den |
A numeric vector with integer values for denominators to calculate bad rates |
log |
A logical constant either TRUE or FALSE. The default is FALSE |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) df <- rbind(Reduce(rbind, lapply(split(hmeq, floor(hmeq$CLAGE)), function(d) data.frame(AGE = unique(floor(d$CLAGE)), NUM = sum(d$BAD), DEN = nrow(d)))), data.frame(AGE = NA, NUM = sum(hmeq[is.na(hmeq$CLAGE), ]$BAD), DEN = nrow(hmeq[is.na(hmeq$CLAGE), ]))) pool_bin(df$AGE, df$NUM, df$DEN, log = TRUE)
data(hmeq) df <- rbind(Reduce(rbind, lapply(split(hmeq, floor(hmeq$CLAGE)), function(d) data.frame(AGE = unique(floor(d$CLAGE)), NUM = sum(d$BAD), DEN = nrow(d)))), data.frame(AGE = NA, NUM = sum(hmeq[is.na(hmeq$CLAGE), ]$BAD), DEN = nrow(hmeq[is.na(hmeq$CLAGE), ]))) pool_bin(df$AGE, df$NUM, df$DEN, log = TRUE)
The function qcut
discretizes a numeric vector into N pieces based
on quantiles.
qcut(x, n)
qcut(x, n)
x |
A numeric vector. |
n |
An integer indicating the number of categories to discretize. |
A numeric vector to divide the vector x into n categories.
x <- 1:10 # [1] 1 2 3 4 5 6 7 8 9 10 v <- qcut(1:10, 4) # [1] 3 5 8 findInterval(x, sort(c(v, -Inf, Inf)), left.open = TRUE) # [1] 1 1 1 2 2 3 3 3 4 4
x <- 1:10 # [1] 1 2 3 4 5 6 7 8 9 10 v <- qcut(1:10, 4) # [1] 3 5 8 findInterval(x, sort(c(v, -Inf, Inf)), left.open = TRUE) # [1] 1 1 1 2 2 3 3 3 4 4
The function qtl_bin
implements the quantile-based monotonic binning
by the iterative discretization
qtl_bin(x, y)
qtl_bin(x, y)
x |
A numeric vector |
y |
A numeric vector with 0/1 binary values |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) qtl_bin(hmeq$DEROG, hmeq$BAD)
data(hmeq) qtl_bin(hmeq$DEROG, hmeq$BAD)
The function rng_bin
implements the quantile-based monotonic binning
by the iterative discretization based on the equal-width range of values.
rng_bin(x, y)
rng_bin(x, y)
x |
A numeric vector |
y |
A numeric vector with 0/1 binary values |
A list of binning outcomes, including a numeric vector with cut points and a dataframe with binning summary
data(hmeq) rng_bin(hmeq$DEROG, hmeq$BAD)
data(hmeq) rng_bin(hmeq$DEROG, hmeq$BAD)