Deploy with Docker

If you plan to bring vetiver to a public or private cloud outside of RStudio Connect, Docker containers are a highly portable solution. Using vetiver makes Dockerfile creation easy by generating the files you need from your trained models.

Import data

For this demo, we will use data from Tidy Tuesday to predict the number of YouTube likes a television commercial played during the Super Bowl will get, based on qualities such as if the ad included any animals, if the ad was funny, if the ad had any elements of danger, etc.

import pandas as pd
import numpy as np

np.random.seed(500)

raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = pd.DataFrame(raw)

df = df[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

df.head(3)
   like_count  funny  show_product_quickly  ...  celebrity  danger  animals
0      1233.0  False                 False  ...      False   False    False
1       485.0   True                  True  ...       True    True    False
2       129.0   True                 False  ...      False    True     True

[3 rows x 7 columns]
library(tidyverse)
superbowl_ads_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')

superbowl_ads <-
    superbowl_ads_raw %>%
    select(funny:animals, like_count) %>%
    na.omit()

superbowl_ads
# A tibble: 225 × 7
   funny show_product_quickly patriotic celebrity danger animals like_count
   <lgl> <lgl>                <lgl>     <lgl>     <lgl>  <lgl>        <dbl>
 1 FALSE FALSE                FALSE     FALSE     FALSE  FALSE         1233
 2 TRUE  TRUE                 FALSE     TRUE      TRUE   FALSE          485
 3 TRUE  FALSE                FALSE     FALSE     TRUE   TRUE           129
 4 FALSE TRUE                 FALSE     FALSE     FALSE  FALSE            2
 5 TRUE  TRUE                 FALSE     FALSE     TRUE   TRUE            20
 6 TRUE  TRUE                 FALSE     TRUE      TRUE   TRUE           115
 7 TRUE  FALSE                FALSE     TRUE      FALSE  TRUE          1470
 8 FALSE FALSE                FALSE     TRUE      FALSE  FALSE           78
 9 TRUE  TRUE                 FALSE     TRUE      FALSE  TRUE           342
10 FALSE TRUE                 TRUE      TRUE      TRUE   FALSE            7
# … with 215 more rows

Build a model

With data in hand, the next step is feature engineering and model estimation. We put these two steps into a single function, such as a pipeline or workflow, and will deploy these pieces together. (Why are we doing this?)

from sklearn import model_selection, preprocessing, pipeline
from sklearn.ensemble import RandomForestRegressor

X, y = df.iloc[:,1:],df['like_count']
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    X, y,
    test_size=0.2
)

le = preprocessing.OrdinalEncoder().fit(X)
rf = RandomForestRegressor().fit(le.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('label_encoder',le), ('random_forest', rf)])
library(tidymodels)

rf_spec <- rand_forest(mode = "regression")
rf_form <- like_count ~ .

rf_fit <-
    workflow(rf_form, rf_spec) %>%
    fit(superbowl_ads)

Create a vetiver model

Next, let’s create a deployable model object with vetiver.

import vetiver

v = vetiver.VetiverModel(
    rf_pipe, 
    ptype_data=X_train, 
    model_name = "superbowl_rf"
)
v.description
"Scikit-learn <class 'sklearn.pipeline.Pipeline'> model"
library(vetiver)

v <- vetiver_model(rf_fit, "superbowl_rf")
v

── superbowl_rf ─ <butchered_workflow> model for deployment 
A ranger regression modeling workflow using 6 features

Version your model

We pin our vetiver model to a board to version it. We will also use this board later to create artifacts to run our Dockerfile.

import pins

board = pins.board_rsconnect(
    server_url=server_url, # load from an .env file
    api_key=api_key, # load from an .env file 
    allow_pickle_read=True
)

vetiver.vetiver_pin_write(board, v)
library(pins)
board <- board_rsconnect() # authenticates via environment variables
vetiver_pin_write(board, v)

Here we are using board_rsconnect(), but you can use other boards such as board_s3(). Read more about how to store and version your vetiver model.

Create API and Docker artifacts

Next, let’s create an app.py or plumber.R file. This file contains the information to create a vetiver REST API and will be running inside your Docker container.

Once you have this file, vetiver can help you write a Dockerfile.

vetiver.write_app(
    board, 
    "isabel.zimmerman/superbowl_rf", 
    version = "20220901T144702Z-fd402"
)
vetiver.write_docker()
Note

You may need to edit the app.py file to load a .env file for authorization, if necessary.

Docker also needs a requirements file. You can generate your own, or use vetiver.load_pkgs() to generate a file named vetiver_req.txt with the packages necessary for prediction.

vetiver_write_plumber(board, "julia.silge/superbowl_rf")
vetiver_write_docker(v, port=8080)
Note

When you run vetiver_write_docker(), you generate two files: the Dockerfile itself and the vetiver_renv.lock file to capture your model dependencies.

You have now created all the artifacts to run your Docker container!

Build and run your Dockerfile

It is time to build and run your container. Building the Docker container can potentially take a while, because it installs all the packages needed to make a prediction with this model. Use the command line (not R or Python) to build your Docker container:

docker build -t superbowlads .
Tip

If you are on an ARM architecture locally and deploying an R model, use --platform linux/amd64 for RSPM’s fast installation of binaries.

Now run! To authenticate to your board (to get the pinned vetiver model from, for example, RStudio Connect), pass in a file supplying environment variables.

docker run --env-file .env -p 8080:8080 superbowlads
Tip

R users likely will store their environment variables in a file called .Renviron instead of .env.

The Docker container is now running locally! You can interact with it, such as by using a browser to visit http://0.0.0.0:8080/__docs__/

Make predictions from Docker container

Running a Docker container locally is a great way to test that you can make predictions from your endpoint as expected, using R or Python.

endpoint = vetiver.vetiver_endpoint("http://0.0.0.0:8080/predict")
vetiver.predict(endpoint=endpoint, data=X_test)
new_ads <- superbowl_ads %>% 
    select(-like_count)

endpoint <- vetiver_endpoint("http://0.0.0.0:8080/predict")

predict(endpoint, new_ads)

When you’re done, stop all Docker containers from the command line with:

docker stop $(docker ps -a -q)

What if I don’t know how to use Docker?

Docker is a great tool for data scientists, so learning the basics is a good idea. These resources can help you get started: