I ran across an R forecasting package recently, *prophet*, I hadn't seen before. This isn't surprising given the flood of new libraries now emerging in the R ecosystem.

Developed by two Facebook data scientists, what struck me most about prophet was the alignment of its sweet spot problem domain with consulting work I'd done a few years ago in digital marketing for a large media company. With that engagement, the challenge was forecasting hundreds of daily time series, each with several years of historical data. Patterns manifested included trend and multiple seasons. Predictions were desired over an entire year, and models were to be updated weekly with the latest data.

I started the work with a pretty standard bag of statistical forecasting tricks, including moving averages, seasonal and trend decomposition, exponential smoothing such as Holt Winters, ARIMA, and even a few econometric alternatives. Alas, after none of these attempts even closely nailed it, I turned to traditional regression and more modern machine learning approaches though, given autocorrelation of disturbances, these are generally considered anathma for forecasting by statistical purists. I was getting desparate, however. A statistical consolation was that I was just interested in the quality of predictions, not overall model purity.

So when I read that: "Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.", I just had to test it out. Could this be what I needed years ago?

A data set I'd used to prep for the digital media engagement was daily births in Quebec from 1977-1991. quebec represents counts of 14 years of daily newborn deliveries in the Canadian province. It served nicely for simulating my digital marketing challenges and I figured it could now help me put prophet through its paces as well.

The remainder of this post examines the results of several modeling exercises in R against the quebec data divided into train with 13 years and test with one. prophet is compared against a basic linear model (lm), a general additive model (gam), and random forests (randomForest).

At the start, set a few options, load some libraries, and change the working directory.

```
options(warn=-1)
options(scipen = 10)
suppressMessages(library(data.table))
suppressMessages(library(dplyr))
suppressMessages(library(lubridate))
suppressMessages(library(dtplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(ggthemes))
suppressMessages(library(purrr))
suppressMessages(library(splines))
suppressMessages(library(forecast))
suppressMessages(library(prophet))
suppressMessages(library(gam))
suppressMessages(library(randomForest))
setwd("c:/data/quebec")
```

Read the quebec births file, building a data.table with variable names "ds" and "y" representing the date and birth counts to accommodate prophet. Create train and test data.tables -- train with the first 13 years of daily data, test with the 14th year.

```
quebec <- fread("dailybirths.csv")[,`:=`(date=ymd(date))] %>%
setnames(c("date", "daily births"),c("ds","y"))
invisible(quebec <- quebec[,y:=as.numeric(y)] %>% tbl_dt)
str(quebec)
```

```
ntest <- 365
len <- nrow(quebec)
trainq <- quebec[1:(len-ntest)]
testq <- quebec[(len-ntest+1):len]
str(trainq)
str(testq)
c(min(trainq$ds), max(trainq$ds))
c(min(testq$ds), max(testq$ds))
```

Define two little functions to compute root mean square error (rmse) and mean absolute percentage error (mape) of actual vs predicted a la forecast for evaluating forecast model performance. Lower is better.

```
rmse <- function(actual,predicted)
round((sum((actual-predicted)^2)/length(actual))^.5,2)
mape <- function(actual,predicted)
round(mean(100*abs((actual-predicted)/actual)),2)
```

Now start fitting basic lm linear models. The first regresses dailybirths (y) on a cubic spline of the integer date (ds) to capture trend. Compute the rmse and mape against the training and test data. Not surprisingly, the stats for train are better than for test.

```
slug1 <- lm(y~ns(ds,13),data=trainq)
trainq$pred1 <- predict(slug1)
testq$pred1 <- predict(slug1,newdata=testq)
c(rmse(trainq$y,trainq$pred1),rmse(testq$y,testq$pred1))
c(mape(trainq$y,trainq$pred1),mape(testq$y,testq$pred1))
```

Next run a model of dailybirths on a day of the week factor to capture intra-week seasonality. This model appears to do better than the first on the preformance metrics, indicating significant day of the week seasonality.

```
slug2 <- lm(y~as.factor(lubridate::wday(ds)), data=trainq)
trainq$pred2 <- predict(slug2)
testq$pred2 <- predict(slug2,newdata=testq)
c(rmse(trainq$y,trainq$pred2),rmse(testq$y,testq$pred2))
c(mape(trainq$y,trainq$pred2),mape(testq$y,testq$pred2))
```

The third model creates a factor based on month to handle monthly seasons. This seasonality doesn't appear to be as strong as day of the week.

```
slug3 <- lm(y~as.factor(lubridate::month(ds)), data=trainq)
trainq$pred3 <- predict(slug3)
testq$pred3 <- predict(slug3,newdata=testq)
c(rmse(trainq$y,trainq$pred3),rmse(testq$y,testq$pred3))
c(mape(trainq$y,trainq$pred3),mape(testq$y,testq$pred3))
```

Finally, run an lm model that includes all three variables above. Note the predictively sanguine lower rmse and mape.

```
slug4 <- lm(y~ns(ds,13) + as.factor(lubridate::wday(ds)) +
as.factor(lubridate::month(ds)), data=trainq)
trainq$pred4 <- predict(slug4)
testq$pred4 <- predict(slug4,newdata=testq)
c(rmse(trainq$y,trainq$pred4),rmse(testq$y,testq$pred4))
c(mape(trainq$y,trainq$pred4),mape(testq$y,testq$pred4))
```

Now run a similar 3 variable general additive model with the gam package. Not surprisingly, the rmse and mape for both train and test are comparable to the final lm model.

```
slug4a <- gam(y~s(ds,11) + as.factor(lubridate::wday(ds)) +
as.factor(lubridate::month(ds)), data=trainq)
trainq$pred4a <- predict(slug4a)
testq$pred4a <- predict(slug4a,newdata=testq)
c(rmse(trainq$y,trainq$pred4a),rmse(testq$y,testq$pred4a))
c(mape(trainq$y,trainq$pred4a),mape(testq$y,testq$pred4a))
```

Fit a randomForest ML model using similar trend and seasonality attributes. Note the larger disparity between train and test performance, indicating overfit to the train data.

```
slug4b <- randomForest(y ~ ds + as.factor(lubridate::wday(ds)) +
as.factor(lubridate::month(ds)),
data=trainq,ntree=1000)
trainq$pred4b <- predict(slug4b)
testq$pred4b <- predict(slug4b,newdata=testq)
c(rmse(trainq$y,trainq$pred4b),rmse(testq$y,testq$pred4b))
c(mape(trainq$y,trainq$pred4b),mape(testq$y,testq$pred4b))
```

Finally partition the data into train and test suitable for prophet and fit its model

```
slug5 <- prophet(as.data.frame(trainq))
future <- make_future_dataframe(slug5, periods = ntest+1, include_history = TRUE)
forecast <- predict(slug5, future)
n <- nrow(trainq)
trainq$yhat <- forecast$yhat[1:n]
testq$yhat <- forecast$yhat[(n+1):(n+ntest)]
c(rmse(trainq$y,trainq$yhat),rmse(testq$y,testq$yhat))
c(mape(trainq$y,trainq$yhat),mape(testq$y,testq$yhat))
```

At this point, let's take a look at predictions from the various lm models fitted above -- first trend, then day of the week, and finally month.

```
ggplot(aes(y=pred1,x=ds), data=trainq) +
ylim(0,1.25*max(trainq$pred1)) +
labs(title="Year",x="Year",y="Births") +
geom_line()
```

```
uwdays <- trainq[,.(.N, w=min(ds)),.(as.factor(lubridate::wday(ds)))]$w
days <- lubridate::wday(uwdays,label=TRUE)
ggplot(aes(y=pred2,x=lubridate::wday(ds)),
data=trainq[ds %in% uwdays]) +
scale_x_continuous(breaks=1:length(days),labels=days) +
labs(title="Day of Week",x="Day",y="Births") +
ylim(0,1.25*max(trainq$pred2)) +
geom_point() +
geom_line()
```

```
umonths <-trainq[,.(.N, w=min(ds)),.(as.factor(lubridate::month(ds)))]$w
months <- lubridate::month(umonths,label=TRUE)
ggplot(aes(y=pred3,x=lubridate::month(ds)),
data=trainq[ds %in% umonths]) +
scale_x_continuous(breaks=1:length(months),labels=months) +
labs(title="Month",x="Month",y="Births") +
ylim(0,1.25*max(trainq$pred3)) +
geom_point() +
geom_line()
```

Next graph the components of the prophet model, which look a lot like the charts above -- except that prophet uses day of the year in contrast to month.

```
prophet_plot_components(slug5, forecast, uncertainty = TRUE)
```

Finally, compare the 3-attribute lm model, the gam model, the prophet model, and the random forest model using the test data. In each panel, the grey denotes actual values while the colors represent model predictions. lm, gam, and prophet perform similary, while random forest lags.

```
testqstack <- data.table::melt(testq,id="ds", measure=c("yhat", "pred4","pred4a","pred4b"))
model_names <- c(
`yhat` = sprintf("prophet -- %.2f : %.2f",
rmse(testq$y,testq$yhat), mape(testq$y,testq$yhat)),
`pred4` = sprintf("lm -- %.2f : %.2f",
rmse(testq$y,testq$pred4), mape(testq$y,testq$pred4)),
`pred4a` = sprintf("gam -- %.2f : %.2f",
rmse(testq$y,testq$pred4a), mape(testq$y,testq$pred4a)),
`pred4b` = sprintf("randomForest -- %.2f : %.2f",
rmse(testq$y,testq$pred4b), mape(testq$y,testq$pred4b))
)
g1 <- ggplot(aes(y=value,x=ds,color=variable),
data=testqstack) +
geom_line(data=testq,aes(y=y),colour="grey70") +
geom_line() +
facet_wrap(~variable,ncol=2,labeller = as_labeller(model_names)) +
scale_x_date() +
ylim(0,1.25*max(testqstack$value)) +
theme(axis.text= element_text(size=5), axis.title = element_text(size=6)) +
theme(legend.position="none") +
labs(title="Model Comparisons, Test Data, with rmse and mape Statistics.",
x="Time",y="Births")
print(g1)
```

With this particular data, the test predictions of the prophet, lm and gam models are quite similar -- and superior to randomForest. This is just one evaluation, but prophet performing as well as the parameterized models here is very encouraging. If prophet's flexible enough to handle other challenges it could indeed live up to the "forecasting at scale" claim. prophet, where were you when I was in need?