Introduction

I’ve recently been interested in modeling using distributions or simpler methods like linear/logistic regression. Having used deep learning and non-linear methods like XGBoost it’s been great to return to methods lately that are easy to implement and interpret. I’ve also discovered or rediscovered using Maximum Likelihood Estimation for estimating model parameters and now want to use it everywhere!

I’ll be replicating the work of Fader, Hardie, and Lee and their beta geometric negative binomial distribution (BG/NBD) for forecasting and fitting a model to customers’ purchasing patterns.

http://www.brucehardie.com/papers/bgnbd_2004-04-20.pdf

I also thought this would be a good example to test out some things I’ve been interested in concerning functional programming in R as well.

Functional Programming

R is a functional language, and lately functional programming has been gaining interest. In this, I test using closures – a function that returns another function. Especially, with how MLE works in R, this project seemed like the perfect candidate. With a closure, the child function stores the environment of the parent function. For our uses we’ll use it to store the dataset and some other parameters.

MLE

Using MLE to find the parameters for your distribution is amazing! I once used STAN to find the lambda value of a truncated poisson distribution. I’m assuming I also used MLE too, but glad if I’m circling back. To be brief we use the negative log of our bg/nbd model and the training data to find the values of the parameters that best fit our data.

What I really love about MLE is that we can use models that are easy to explain and understand. Maybe they’re not as powerful as the latest SOTA model, but they are plenty of reasons to prefer. Training simpler models takes milliseconds and less data than our smart phone generates in a second. ML models can use millions of observations and still sometimes return lackluster results. Modeling using well known distributions are in my opinion often good enough, especially if we’re concerned with results.

Lifetimevalue Package

To make this easy to use I’ve written this as an R package. As long as the data is formatted correctly you should be able to use this in your own analysis.

https://github.com/hoffmben/lifetimevalue

Analysis

Sample data

We’ll be using the original CDNOW data set from the paper, “Counting Your Customers” the Easy Way, that covers January 1997 through June 1998 of 2357 customers’ transaction history.

The original code to generate the data was written in MATLAB and let’s all be glad MATLAB is no longer used as often. Nothing against MATLAB, but the code is very hard to read compared to R or Python. I’ve included the R script to generate the sample data. The formatting is:

x: Number of repeat transactions
t_x: Time of last repeat
T_: Calibration period

All times are measured in weeks. For our calibration period this is 39 weeks.

The calibration period is the duration in which we collected data. We’d probably call this a training set nowadays.

data(CDNOW_sample_summary)
head(CDNOW_sample_summary)

## # A tibble: 6 × 4
##   sample_id     x   t_x    T_
##   <chr>     <dbl> <dbl> <dbl>
## 1 0001          2 30.4   38.9
## 2 0002          1  1.71  38.9
## 3 0003          0  0     38.9
## 4 0004          0  0     38.9
## 5 0005          0  0     38.9
## 6 0006          7 29.4   38.9

Estimating the Paramters

Before we cover the parameter estimation, I want to cover what a closure in R looks like.

Simply enough, it’s a function that returns another function. We see our outer function sets the environment and a single parameter, estimate, that toggles whether to return the observation’s log likelihood contribution or the sum of the log likelihoods for MLE. The inner function does the work of calling the rest of the functions used in estimating the parameters of our bg/nbd model.

Just a note, this was not easy to document in R at least using roxygen2. Instead of being able to document using tags we document in the return tag.

bg_nbd

## function(rfm_data, estimate=FALSE) {
## 
##   bg_nbd_param <- function(r=1, alpha=1, a=1, b=1) {
## 
##     valid_cols <- c('x', 't_x', 'T_')
## 
##     if(!any(valid_cols %in% colnames(rfm_data))) {
## 
##       stop('Valid columns are x, t_x, and T_')
## 
##     }
## 
##     x <- rfm_data$x
##     t_x <- rfm_data$t_x
##     T_ <- rfm_data$T_
## 
##     if(estimate) {
##       -sum(.a1(x, r, alpha) + .a2(x, a, b) + log(exp(.a3(x, T_, r, alpha)) + ifelse(x > 0, exp(.a4(x, t_x, r, alpha, a, b)), 0)))
##     } else {
##       .a1(x, r, alpha) + .a2(x, a, b) + log(exp(.a3(x, T_, r, alpha)) + ifelse(x > 0, exp(.a4(x, t_x, r, alpha, a, b)), 0))
##     }
## 
##   }
## 
## }
## <bytecode: 0x55c25b8c9bb0>
## <environment: namespace:lifetimevalue>

We’ll now estimate our parameters using MLE! We’ll do this using 2 different packages. One is stats4, which I believe comes default in an R installation and the other, bblme. BBLME is included because we can also calculate confidence intervals for our estimates.

For reference the authors estimates are:

r: 0.243
alpha: 4.414
a: 0.793
b: 2.426

bg_nbd_est <- bg_nbd(CDNOW_sample_summary, estimate = TRUE)
param_est <- stats4::mle(bg_nbd_est, 
                         start = c(1, 4, 1, 2), 
                         lower = c(0.001, 0.001, 0.001, 0.001))
param_est

## 
## Call:
## stats4::mle(minuslogl = bg_nbd_est, start = c(1, 4, 1, 2), lower = c(0.001, 
##     0.001, 0.001, 0.001))
## 
## Coefficients:
##         r     alpha         a         b 
## 0.2426707 4.4159851 0.7928716 2.4247084

Comparing to the authors’ estimates, we’re very close!

Next, we’ll use the bbmle package.

param_est <- bbmle::mle2(bg_nbd_est, 
                         method = "L-BFGS-B", 
                         start = list(r = .5, alpha = 4, a = .8, b = 2), 
                         lower = c(0.001, 0.001, 0.001, 0.001))
param_est

## 
## Call:
## bbmle::mle2(minuslogl = bg_nbd_est, start = list(r = 0.5, alpha = 4, 
##     a = 0.8, b = 2), method = "L-BFGS-B", lower = c(0.001, 0.001, 
##     0.001, 0.001))
## 
## Coefficients:
##         r     alpha         a         b 
## 0.2425990 4.4142335 0.7931941 2.4264155 
## 
## Log-likelihood: -9582.43

This time we’re more or less spot on!

Standard Deviation

To note, some of these parameters have fairly large standard deviations compared to the size of the estimate.

bbmle::stdEr(param_est)

##          r      alpha          a          b 
## 0.01255808 0.37830214 0.18583110 0.70553411

Confidence Interval

We’ll finally conclude with confidence intervals. And just as the standard deviation alluded to some of the estimates have large ranges – namely a and b. Given these are our parameters for our beta function, controlling the probability of a customer becoming inactive, it’s likely in our data we have large variance here and means our ability to forecast a customer becoming inactive could be improved.

bbmle::confint(param_est)

##           2.5 %    97.5 %
## r     0.2192626 0.2686068
## alpha 3.7245106 5.2113410
## a     0.5159497 1.3188906
## b     1.4299255 4.5672882

Forecasting

Now that we have our model parameters we’ll use it to forecast a customer’s expected number of transactions. For our example customer we’ll use their previous transactions: 2, time of last repeat purchase: 30.43 weeks, calibration period: 38.86 weeks, and finally the range we want to forecast over: 39 weeks.

e_yt <- forecast()
purchase_pred <- e_yt(2, 30.43, 38.86, 39)
purchase_pred

## [1] 1.225763

We find an expected number of transactions of about 1.23. If we look in the raw data for this customer, id: 0001, they did have 1 transaction outside of our calibration period!

Conclusion

MLE is a very powerful tool to estimate model parameters. With it we can forecast a customers repeat purchasing and use it in plenty of other ways too, like A/B testing.

Functional programming in R is also very useful too. Closures allow us to attach an environment to a function instead of having a shared global environment.

LifetimeValue: MLE and Me