Deploy with Docker

If you plan to bring vetiver to a public or private cloud rather than Posit Connect, Docker containers are a highly portable solution. Using vetiver makes Dockerfile creation easy by generating the files you need from your trained models.

Import data

For this demo, we will use data from Tidy Tuesday to predict the number of YouTube likes a television commercial played during the Super Bowl will get, based on qualities such as if the ad included any animals, if the ad was funny, if the ad had any elements of danger, etc.

import pandas as pd
import numpy as np

np.random.seed(500)

raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = pd.DataFrame(raw)

df = df[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

df.head(3)
   like_count  funny  show_product_quickly  ...  celebrity  danger  animals
0      1233.0  False                 False  ...      False   False    False
1       485.0   True                  True  ...       True    True    False
2       129.0   True                 False  ...      False    True     True

[3 rows x 7 columns]
library(tidyverse)
superbowl_ads_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')

superbowl_ads <-
    superbowl_ads_raw %>%
    select(funny:animals, like_count) %>%
    na.omit()

superbowl_ads
# A tibble: 225 × 7
   funny show_product_quickly patriotic celebrity danger animals like_count
   <lgl> <lgl>                <lgl>     <lgl>     <lgl>  <lgl>        <dbl>
 1 FALSE FALSE                FALSE     FALSE     FALSE  FALSE         1233
 2 TRUE  TRUE                 FALSE     TRUE      TRUE   FALSE          485
 3 TRUE  FALSE                FALSE     FALSE     TRUE   TRUE           129
 4 FALSE TRUE                 FALSE     FALSE     FALSE  FALSE            2
 5 TRUE  TRUE                 FALSE     FALSE     TRUE   TRUE            20
 6 TRUE  TRUE                 FALSE     TRUE      TRUE   TRUE           115
 7 TRUE  FALSE                FALSE     TRUE      FALSE  TRUE          1470
 8 FALSE FALSE                FALSE     TRUE      FALSE  FALSE           78
 9 TRUE  TRUE                 FALSE     TRUE      FALSE  TRUE           342
10 FALSE TRUE                 TRUE      TRUE      TRUE   FALSE            7
# ℹ 215 more rows

Build a model

With data in hand, the next step is feature engineering and model estimation. We put these two steps into a single function, such as a pipeline or workflow, and will deploy these pieces together. (Why are we doing this?)

from sklearn import model_selection, preprocessing, pipeline
from sklearn.ensemble import RandomForestRegressor

X, y = df.iloc[:,1:],df['like_count']
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    X, y,
    test_size=0.2
)

le = preprocessing.OrdinalEncoder().fit(X)
rf = RandomForestRegressor().fit(le.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('label_encoder',le), ('random_forest', rf)])
library(tidymodels)

rf_spec <- rand_forest(mode = "regression")
rf_form <- like_count ~ .

rf_fit <-
    workflow(rf_form, rf_spec) %>%
    fit(superbowl_ads)

Create a vetiver model

Next, let’s create a deployable model object with vetiver.

import vetiver

v = vetiver.VetiverModel(
    rf_pipe, 
    prototype_data=X_train, 
    model_name = "superbowl_rf"
)
v.description
'A scikit-learn Pipeline model'
library(vetiver)

v <- vetiver_model(rf_fit, "superbowl_rf")
v

── superbowl_rf ─ <bundled_workflow> model for deployment 
A ranger regression modeling workflow using 6 features

Version your model

We pin our vetiver model to a board to version it. We will also use this board later to create artifacts for our Dockerfile.

import pins

board = pins.board_rsconnect(
    server_url=server_url, # load from an .env file
    api_key=api_key, # load from an .env file 
    allow_pickle_read=True
)

vetiver.vetiver_pin_write(board, v)
library(pins)
board <- board_connect() # authenticates via environment variables
vetiver_pin_write(board, v)

Here we are using board_connect(), but you can use other boards such as board_s3(). Read more about how to store and version your vetiver model.

Local boards such as board_folder() will not be immediately available to Docker images created by vetiver. We don’t recommend that you store your model inside your container, but (if appropriate to your use case) it is possible to edit the generated Dockerfile and COPY the folder and model into the container. Alternatively, you can mount the folder as a VOLUME.

Learn more about why we recommend storing your versioned model binaries outside Docker containers in this talk.

Create Docker artifacts

To build a Docker image that can serve your model, you need three artifacts:

  • the Dockerfile itself,
  • a requirements.txt or renv.lock to capture your model dependencies, and
  • an app.py or plumber.R file containing the information to serve a vetiver REST API.

You can create all the needed files with one function.

vetiver.prepare_docker(
    board, 
    "isabel.zimmerman/superbowl_rf",
    version = "20220901T144702Z-fd402",
    port = 8080
)
vetiver_prepare_docker(
    board, 
    "julia.silge/superbowl_rf", 
    docker_args = list(port = 8080)
)

You have now created all the files needed to build your Docker image!

Build and run your Dockerfile

It is time to build and run your container. Building the Docker container can potentially take a while, because it installs all the packages needed to make a prediction with this model. Use the command line (not R or Python) to build your Docker container:

docker build -t superbowlads .
Tip

If you are on an ARM architecture locally and deploying an R model, use --platform linux/amd64 for RSPM’s fast installation of R package binaries.

Now run! To authenticate to your board (to get the pinned vetiver model from, for example, Posit Connect), pass in a file supplying environment variables.

docker run --env-file .env -p 8080:8080 superbowlads
Tip

R users likely will store their environment variables in a file called .Renviron instead of .env.

The Docker container is now running locally! You can interact with it, such as by using a browser to visit http://0.0.0.0:8080/__docs__/

Make predictions from Docker container

Running a Docker container locally is a great way to test that you can make predictions from your endpoint as expected, using R or Python.

endpoint = vetiver.vetiver_endpoint("http://0.0.0.0:8080/predict")
vetiver.predict(endpoint=endpoint, data=X_test)
new_ads <- superbowl_ads %>% 
    select(-like_count)

endpoint <- vetiver_endpoint("http://0.0.0.0:8080/predict")

predict(endpoint, new_ads)

When you’re done, stop all Docker containers from the command line with:

docker stop $(docker ps -a -q)

What if I don’t know how to use Docker?

Docker is a great tool for data scientists, so learning the basics is a good idea. These resources can help you get started: