Title: | Tools for Uplift Modeling |
---|---|
Description: | Uplift modeling aims at predicting the causal effect of an action such as a marketing campaign on a particular individual. In order to simplify the task for practitioners in uplift modeling, we propose a combination of tools that can be separated into the following ingredients: i) quantization, ii) visualization, iii) variable selection, iv) parameters estimation and, v) model validation. For more details, please read "Belbahri, Murua, Gandouet, Partovi Nia - Uplift Regression : The R Package tools4uplift". |
Authors: | Mouloud Belbahri, Olivier Gandouet, Alejandro Murua, Vahid Partovi Nia |
Maintainer: | Mouloud Belbahri <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-03-05 04:02:02 UTC |
Source: | https://github.com/belbahrim/tools4uplift |
Barplot of observed uplift with respect to predicted uplift sorted from the highest to the lowest.
## S3 method for class 'PerformanceUplift' barplot(height, ...)
## S3 method for class 'PerformanceUplift' barplot(height, ...)
height |
a table that must be the output of |
... |
additional barplot arguments. |
a barplot and the associated Kendall's uplift rank correlation
Mouloud Belbahri
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
PerformanceUplift
library(tools4uplift) data("SimUplift") model <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") #performance of the heat map uplift estimation on the training dataset perf <- PerformanceUplift(data = model, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 5) barplot(perf)
library(tools4uplift) data("SimUplift") model <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") #performance of the heat map uplift estimation on the training dataset perf <- PerformanceUplift(data = model, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 5) barplot(perf)
Qini-based Uplift Regression in order to select the features that maximize the Qini coefficient.
BestFeatures(data, treat, outcome, predictors, rank.precision = 2, equal.intervals = FALSE, nb.group = 10, validation = TRUE, p = 0.3)
BestFeatures(data, treat, outcome, predictors, rank.precision = 2, equal.intervals = FALSE, nb.group = 10, validation = TRUE, p = 0.3)
data |
a data frame containing the treatment, the outcome and the predictors. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
name of a binary response (numeric) vector (coded as 0/1). |
predictors |
a vector of names representing the predictors to consider in the model. |
rank.precision |
precision for the ranking quantiles to compute the Qini coefficient. Must be 1 or 2. If 1, the ranking quantiles will be rounded to the first decimal. If 2, to the second decimal. |
equal.intervals |
flag for using equal intervals (with equal number of observations) or the true ranking quantiles which result in an unequal number of observations in each group to compute the Qini coefficient. |
nb.group |
the number of groups for computing the Qini coefficient if equal.intervals is TRUE - Default is 10. |
validation |
if TRUE, the best features are selected based on cross-validation - Default is TRUE. |
p |
if validation is TRUE, the desired proportion for the validation set. p is a value between 0 and 1 expressed as a decimal, it is set to be proportional to the number of observations per group - Default is 0.3. |
The regularization parameter is chosen based on the interaction uplift model that maximizes the Qini coefficient. Using the LASSO penalty, some predictors have coefficients set to zero.
a vector of names representing the selected best features from the penalized logistic regression.
Mouloud Belbahri
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2020) Qini-based Uplift Regression, <https://arxiv.org/pdf/1911.12474.pdf>
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
library(tools4uplift) data("SimUplift") features <- BestFeatures(data = SimUplift, treat = "treat", outcome = "y", predictors = colnames(SimUplift[,3:7]), equal.intervals = TRUE, nb.group = 5, validation = FALSE) features
library(tools4uplift) data("SimUplift") features <- BestFeatures(data = SimUplift, treat = "treat", outcome = "y", predictors = colnames(SimUplift[,3:7]), equal.intervals = TRUE, nb.group = 5, validation = FALSE) features
Univariate optimal partitionning for Uplift Models. The algorithm quantizes a single variable into bins with significantly different observed uplift.
BinUplift(data, treat, outcome, x, n.split = 10, alpha = 0.05, n.min = 30)
BinUplift(data, treat, outcome, x, n.split = 10, alpha = 0.05, n.min = 30)
data |
a data frame containing the treatment, the outcome and the predictor to quantize. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
name of a binary response (numeric) vector (coded as 0/1). |
x |
name of the explanatory variable to quantize. |
n.split |
number of splits to test at each node. For continuous explanatory variables only (must be > 0). If n.split = 10, the test will be executed at each decile of the variable. |
alpha |
significance level of the statistical test (must be between 0 and 1). |
n.min |
minimum number of observations per child node. |
out.tree |
Descriptive statistics for the different nodes of the tree |
Mouloud Belbahri
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
predict.BinUplift
library(tools4uplift) data("SimUplift") binX1 <- BinUplift(data = SimUplift, treat = "treat", outcome = "y", x = "X1", n.split = 100, alpha = 0.01, n.min = 30)
library(tools4uplift) data("SimUplift") binX1 <- BinUplift(data = SimUplift, treat = "treat", outcome = "y", x = "X1", n.split = 100, alpha = 0.01, n.min = 30)
A non-parametric heat map representing the observed uplift in rectangles that explore a bivariate dimension space. The function also returns the individual uplift based on the heatmap.
BinUplift2d(data, var1, var2, treat, outcome, valid = NULL, n.split = 3, n.min = 30, plotit = FALSE, nb.col = 20)
BinUplift2d(data, var1, var2, treat, outcome, valid = NULL, n.split = 3, n.min = 30, plotit = FALSE, nb.col = 20)
data |
a data frame containing uplift models variables. |
var1 |
x-axis variable name. Represents the first dimension of interest. |
var2 |
y-axis variable name. Represents the second dimension of interest. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
name of a binary response (numeric) vector (coded as 0/1). |
valid |
a validation data frame containing uplift models variables. |
n.split |
the number of intervals to consider per explanatory variable. Must be an integer > 1. |
n.min |
minimum number of observations per group (treatment and control) within each rectangle. Must be an integer > 0. |
plotit |
if TRUE, a heatmap of observed uplift per rectangle is plotted. |
nb.col |
number of colors for the heatmap. Default is 20. Must be an integer and should greater than |
returns an augmented dataset with Uplift_var1_var2
variable representing a predicted uplift for each observation based on the rectangle it belongs to. The function also plots a heat map of observed uplifts.
Mouloud Belbahri
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
library(tools4uplift) data("SimUplift") heatmap <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y")
library(tools4uplift) data("SimUplift") heatmap <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y")
Fit the two-model uplift model estimator.
## S3 method for class 'formula' DualUplift(formula, treat, data, ...) ## Default S3 method: DualUplift(data, treat, outcome, predictors, ...)
## S3 method for class 'formula' DualUplift(formula, treat, data, ...) ## Default S3 method: DualUplift(data, treat, outcome, predictors, ...)
data , formula
|
a data frame containing the treatment, the outcome and the predictors or a formula describing the model to be fitted. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
name of a binary response (numeric) vector (coded as 0/1). |
predictors |
a vector of names representing the explanatory variables to include in the model. |
... |
additional arguments (other than |
model0 |
Fitted model for control group |
model1 |
Fitted model for treatment group |
Mouloud Belbahri
Hansotia, B., J., and Rukstales B. (2001) Direct marketing for multichannel retailers: Issues, challenges and solutions. Journal of Database Marketing and Customer Strategy Management, Vol. 9(3), 259-266.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
library(tools4uplift) data("SimUplift") fit <- DualUplift(SimUplift, "treat", "y", predictors = colnames(SimUplift[, 3:12])) print(fit) summary(fit)
library(tools4uplift) data("SimUplift") fit <- DualUplift(SimUplift, "treat", "y", predictors = colnames(SimUplift[, 3:12])) print(fit) summary(fit)
Fit the interaction uplift model estimator.
## S3 method for class 'formula' InterUplift(formula, treat, data, ...) ## Default S3 method: InterUplift(data, treat, outcome, predictors, input = "all", ...)
## S3 method for class 'formula' InterUplift(formula, treat, data, ...) ## Default S3 method: InterUplift(data, treat, outcome, predictors, input = "all", ...)
data , formula
|
a data frame containing the treatment, the outcome and the predictors or a formula describing the model to be fitted. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
name of a binary response (numeric) vector (coded as 0/1). |
predictors |
a vector of names representing the explanatory variables to include in the model. |
input |
an option for |
... |
additional arguments (other than |
an interaction model
Mouloud Belbahri
Lo, V., S., Y. (2002) The true lift model: a novel data mining approach to response modeling in database marketing. ACM SIGKDD Explorations Newsletter, Vol. 4(2), 78-86.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
library(tools4uplift) data("SimUplift") fit <- InterUplift(SimUplift, "treat", "y", colnames(SimUplift[, 3:12]))
library(tools4uplift) data("SimUplift") fit <- InterUplift(SimUplift, "treat", "y", colnames(SimUplift[, 3:12]))
Fit an interaction uplift model via penalized maximum likelihood. The regularization path is computed for the lasso penalty at a grid of values for the regularization constant.
LassoPath(data, formula)
LassoPath(data, formula)
data |
a data frame containing the treatment, the outcome and the predictors. |
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. |
a dataframe containing the coefficients values and the number of nonzeros coefficients for different values of lambda.
Mouloud Belbahri
Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software, Vol. 33(1), 1-22
BestFeatures
, glmnet
#See glmnet() from library("glmnet") for more information
#See glmnet() from library("glmnet") for more information
Curve of the function Qini, the incremental observed uplift with respect to predicted uplift sorted from the highest to the lowest.
## S3 method for class 'PerformanceUplift' lines(x, ...)
## S3 method for class 'PerformanceUplift' lines(x, ...)
x |
a table that must be the output of |
... |
additional plot arguments. |
a Qini curve and the associated Qini coefficient
Mouloud Belbahri
Radcliffe, N. (2007). Using control groups to target on predicted lift: Building and assessing uplift models. Direct Marketing Analytics Journal, An Annual Publication from the Direct Marketing Association Analytics Council, pages 14-21.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
PerformanceUplift
library(tools4uplift) data("SimUplift") model1 <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") perf1 <- PerformanceUplift(data = model1, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 3) model2 <- BinUplift2d(SimUplift, "X3", "X4", "treat", "y") perf2 <- PerformanceUplift(data = model2, treat = "treat", outcome = "y", prediction = "Uplift_X3_X4", equal.intervals = TRUE, nb.group = 3) plot(perf1, type='b') lines(perf2, type='b', col='red')
library(tools4uplift) data("SimUplift") model1 <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") perf1 <- PerformanceUplift(data = model1, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 3) model2 <- BinUplift2d(SimUplift, "X3", "X4", "treat", "y") perf2 <- PerformanceUplift(data = model2, treat = "treat", outcome = "y", prediction = "Uplift_X3_X4", equal.intervals = TRUE, nb.group = 3) plot(perf1, type='b') lines(perf2, type='b', col='red')
Table of performance of an uplift model. This table is used in order to vizualise the performance of an uplift model and to compute the qini coefficient.
PerformanceUplift(data, treat, outcome, prediction, nb.group = 10, equal.intervals = TRUE, rank.precision = 2)
PerformanceUplift(data, treat, outcome, prediction, nb.group = 10, equal.intervals = TRUE, rank.precision = 2)
data |
a data frame containing the response, the treatment and predicted uplift. |
treat |
a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
a binary response (numeric) vector (coded as 0/1). |
prediction |
a predicted uplift (numeric) vector to sort the observations from highest to lowest uplift. |
nb.group |
if equal.intervals is set to true, the number of groups of equal observations in which to partition the data set to show results. |
equal.intervals |
flag for using equal intervals (with equal number of observations) or the true ranking quantiles which result in an unequal number of observations in each group. |
rank.precision |
precision for the ranking quantiles. Must be 1 or 2. If 1, the ranking quantiles will be rounded to the first decimal. If 2, to the second decimal. |
a table with descriptive statistics related to an uplift model estimator.
Mouloud Belbahri
Radcliffe, N. (2007). Using control groups to target on predicted lift: Building and assessing uplift models. Direct Marketing Analytics Journal, An Annual Publication from the Direct Marketing Association Analytics Council, pages 14-21.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
QiniArea
library(tools4uplift) data("SimUplift") model1 <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") perf1 <- PerformanceUplift(data = model1, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 3) print(perf1)
library(tools4uplift) data("SimUplift") model1 <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") perf1 <- PerformanceUplift(data = model1, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 3) print(perf1)
Curve of the function Qini, the incremental observed uplift with respect to predicted uplift sorted from the highest to the lowest.
## S3 method for class 'PerformanceUplift' plot(x, ...)
## S3 method for class 'PerformanceUplift' plot(x, ...)
x |
a table that must be the output of |
... |
additional plot arguments. |
a Qini curve and the associated Qini coefficient
Mouloud Belbahri
Radcliffe, N. (2007). Using control groups to target on predicted lift: Building and assessing uplift models. Direct Marketing Analytics Journal, An Annual Publication from the Direct Marketing Association Analytics Council, pages 14-21.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
PerformanceUplift
library(tools4uplift) data("SimUplift") model1 <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") perf1 <- PerformanceUplift(data = model1, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 3) plot(perf1, type='b')
library(tools4uplift) data("SimUplift") model1 <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") perf1 <- PerformanceUplift(data = model1, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 3) plot(perf1, type='b')
Predictions from the univariate quantization method, i.e. this function transforms a continuous variable into a categorical one.
## S3 method for class 'BinUplift' predict(object, newdata, ...)
## S3 method for class 'BinUplift' predict(object, newdata, ...)
object |
an object of class |
newdata |
the variable that was quantized in |
... |
additional arguments to be passed to |
a quantized variable
Mouloud Belbahri
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
BinUplift
library(tools4uplift) data("SimUplift") binX1 <- BinUplift(data = SimUplift, treat = "treat", outcome = "y", x = "X1", n.split = 100, alpha = 0.01, n.min = 30) quantizedX1 <- predict(binX1, SimUplift$X1)
library(tools4uplift) data("SimUplift") binX1 <- BinUplift(data = SimUplift, treat = "treat", outcome = "y", x = "X1", n.split = 100, alpha = 0.01, n.min = 30) quantizedX1 <- predict(binX1, SimUplift$X1)
Predictions from the two-model uplift model estimator with associated model performance.
## S3 method for class 'DualUplift' predict(object, newdata, ...)
## S3 method for class 'DualUplift' predict(object, newdata, ...)
object |
an object of class |
newdata |
a data frame containing the treatment, the outcome and the predictors of observations at which predictions are required. |
... |
additional arguments to be passed to |
a vector of predicted uplift
Mouloud Belbahri
Hansotia, B., J., and Rukstales B. (2001) Direct marketing for multichannel retailers: Issues, challenges and solutions. Journal of Database Marketing and Customer Strategy Management, Vol. 9(3), 259-266.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
DualUplift
library(tools4uplift) data("SimUplift") fit <- DualUplift(SimUplift, "treat", "y", predictors = colnames(SimUplift[, 3:12])) pred <- predict(fit, SimUplift)
library(tools4uplift) data("SimUplift") fit <- DualUplift(SimUplift, "treat", "y", predictors = colnames(SimUplift[, 3:12])) pred <- predict(fit, SimUplift)
Predictions from the interaction uplift model estimator with associated model performance.
## S3 method for class 'InterUplift' predict(object, newdata, treat, ...)
## S3 method for class 'InterUplift' predict(object, newdata, treat, ...)
object |
an object of class |
newdata |
a data frame containing the treatment, the outcome and the predictors of observations at which predictions are required. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
... |
additional arguments to be passed to |
a vector of predicted uplift
Mouloud Belbahri
Lo, V., S., Y. (2002) The true lift model: a novel data mining approach to response modeling in database marketing. ACM SIGKDD Explorations Newsletter, Vol. 4(2), 78-86.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
InterUplift
library(tools4uplift) data("SimUplift") fit <- InterUplift(SimUplift, "treat", "y", colnames(SimUplift[, 3:12])) pred <- predict(fit, SimUplift, "treat")
library(tools4uplift) data("SimUplift") fit <- InterUplift(SimUplift, "treat", "y", colnames(SimUplift[, 3:12])) pred <- predict(fit, SimUplift, "treat")
Computes the area under the Qini curve.
## S3 method for class 'PerformanceUplift' QiniArea(x, adjusted=FALSE, ...)
## S3 method for class 'PerformanceUplift' QiniArea(x, adjusted=FALSE, ...)
x |
a table that must be the output of |
adjusted |
if TRUE, returns the Qini coefficient adjusted by the Kendall's uplift rank correlation. |
... |
Generic S3 Method argument. |
the Qini or the adjusted Qini coefficient
Mouloud Belbahri
Radcliffe, N. (2007). Using control groups to target on predicted lift: Building and assessing uplift models. Direct Marketing Analytics Journal, An Annual Publication from the Direct Marketing Association Analytics Council, pages 14-21.
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2020) Qini-based Uplift Regression, <https://arxiv.org/pdf/1911.12474.pdf>
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
PerformanceUplift
library(tools4uplift) data("SimUplift") model <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") #performance of the heat map uplift estimation on the training dataset perf <- PerformanceUplift(data = model, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 5) QiniArea(perf)
library(tools4uplift) data("SimUplift") model <- BinUplift2d(SimUplift, "X1", "X2", "treat", "y") #performance of the heat map uplift estimation on the training dataset perf <- PerformanceUplift(data = model, treat = "treat", outcome = "y", prediction = "Uplift_X1_X2", equal.intervals = TRUE, nb.group = 5) QiniArea(perf)
A Qini-based LHS (Latin Hypercube Sampling) uplift model.
qLHS(data, treat, outcome, predictors, lhs_points = 50, lhs_range = 1, adjusted = TRUE, rank.precision = 2, equal.intervals = FALSE, nb.group = 10, validation = TRUE, p = 0.3)
qLHS(data, treat, outcome, predictors, lhs_points = 50, lhs_range = 1, adjusted = TRUE, rank.precision = 2, equal.intervals = FALSE, nb.group = 10, validation = TRUE, p = 0.3)
data |
a data frame containing the treatment, the outcome and the predictors. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
name of a binary response (numeric) vector (coded as 0/1). |
predictors |
a vector of names representing the predictors to consider in the model. |
lhs_points |
number of LHS points to sample for each regularization constant. |
lhs_range |
a multiplicative scalar that controls the variance of the LHS search - Default is 1, the LHS procedure samples points uniformly with variance equal to the variance of the maximum likelihood estimator. |
adjusted |
if TRUE, the adjusted Qini coefficient is used instead of the Qini coefficient. |
rank.precision |
precision for the ranking quantiles to compute the Qini coefficient. Must be 1 or 2. If 1, the ranking quantiles will be rounded to the first decimal. If 2, to the second decimal. |
equal.intervals |
flag for using equal intervals (with equal number of observations) or the true ranking quantiles which result in an unequal number of observations in each group to compute the Qini coefficient. |
nb.group |
the number of groups for computing the Qini coefficient if equal.intervals is TRUE - Default is 10. |
validation |
if TRUE, the best LHS model is selected based on cross-validation - Default is TRUE. |
p |
if validation is TRUE, the desired proportion for the validation set. p is a value between 0 and 1 expressed as a decimal, it is set to be proportional to the number of observations per group - Default is 0.3. |
The regularization parameter is chosen based on the interaction uplift model that maximizes the Qini coefficient of the LHS search.
the models with LHS coefficients of class InterUplift
.
Mouloud Belbahri
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2020) Qini-based Uplift Regression, <https://arxiv.org/pdf/1911.12474.pdf>
library(tools4uplift) data("SimUplift") upliftLHS <- qLHS(data = SimUplift, treat = "treat", outcome = "y", predictors = colnames(SimUplift[,3:7]), lhs_points = 5, lhs_range = 1, adjusted = TRUE, equal.intervals = TRUE, nb.group = 5, validation = FALSE)
library(tools4uplift) data("SimUplift") upliftLHS <- qLHS(data = SimUplift, treat = "treat", outcome = "y", predictors = colnames(SimUplift[,3:7]), lhs_points = 5, lhs_range = 1, adjusted = TRUE, equal.intervals = TRUE, nb.group = 5, validation = FALSE)
The synthetic data contains 20 predictors, a treatment allocation variable and an outcome binary variable. This dataset is used in the package examples.
data("SimUplift")
data("SimUplift")
A data frame with 1000 observations on the following 22 variables.
y
a binary response vector
treat
a binary treatment allocation vector
X1
a numeric vector
X2
a numeric vector
X3
a numeric vector
X4
a numeric vector
X5
a numeric vector
X6
a numeric vector
X7
a numeric vector
X8
a numeric vector
X9
a numeric vector
X10
a numeric vector
X11
a numeric vector
X12
a numeric vector
X13
a numeric vector
X14
a numeric vector
X15
a numeric vector
X16
a numeric vector
X17
a numeric vector
X18
a numeric vector
X19
a numeric vector
X20
a numeric vector
data("SimUplift")
data("SimUplift")
Split a dataset into training and validation subsets with respect to the uplift sample distribution.
SplitUplift(data, p, group)
SplitUplift(data, p, group)
data |
a data frame of interest that contains at least the response and the treatment variables. |
p |
The desired sample size. p is a value between 0 and 1 expressed as a decimal, it is set to be proportional to the number of observations per group. |
group |
Your grouping variables. Generally, for uplift modelling, this should be a vector of treatment and response variables names, e.g. c("treat", "y"). |
train |
a training data frame of p percent |
valid |
a validation data frame of 1-p percent |
Mouloud Belbahri
Belbahri, M., Murua, A., Gandouet, O., and Partovi Nia, V. (2021) Uplift Regression : The R Package tools4uplift, <https://arxiv.org/pdf/1901.10867.pdf>
library(tools4uplift) data("SimUplift") split <- SplitUplift(SimUplift, 0.8, c("treat", "y")) train <- split[[1]] valid <- split[[2]]
library(tools4uplift) data("SimUplift") split <- SplitUplift(SimUplift, 0.8, c("treat", "y")) train <- split[[1]] valid <- split[[2]]
Computes the observed uplift per category and creates a barplot.
UpliftPerCat(data, treat, outcome, x, ...)
UpliftPerCat(data, treat, outcome, x, ...)
data |
a data frame containing the treatment, the outcome and the variable of interest. |
treat |
name of a binary (numeric) vector representing the treatment assignment (coded as 0/1). |
outcome |
name of a binary response (numeric) vector (coded as 0/1). |
x |
name of the explanatory variable of interest. |
... |
extra parameters for the barplot. |
returns a barplot representing the uplift per category.
Mouloud Belbahri
library(tools4uplift) data("SimUplift") binX1 <- BinUplift(data = SimUplift, treat = "treat", outcome = "y", x = "X1", n.split = 100, alpha = 0.01, n.min = 30) SimUplift$quantizedX1 <- predict(binX1, SimUplift$X1) UpliftPerCat(data = SimUplift, treat = "treat", outcome = "y", x = "quantizedX1", xlab='Quantized X1', ylab='Uplift', ylim=c(-1,1))
library(tools4uplift) data("SimUplift") binX1 <- BinUplift(data = SimUplift, treat = "treat", outcome = "y", x = "X1", n.split = 100, alpha = 0.01, n.min = 30) SimUplift$quantizedX1 <- predict(binX1, SimUplift$X1) UpliftPerCat(data = SimUplift, treat = "treat", outcome = "y", x = "quantizedX1", xlab='Quantized X1', ylab='Uplift', ylim=c(-1,1))