Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Model serving

What it is

Once a model is trained and versioned, the last step is putting it somewhere a collaborator or downstream user can actually call it. You have three broad options:

The reference implementation is Dockerfile.serve in chicago-aiscience/workshop-sst. It wraps a trained .joblib behind a small FastAPI app with /health, /model-info, and /predict endpoints.

Why use it

A container is the most portable handoff format as anyone with Docker can pull the image and get a working prediction endpoint, with no Python environment and no dependency installation.

Publishing to GitHub Container Registry (GHCR) ties the image to the repository and makes it docker pull-able by any collaborator with access.

When to use it

How to use it

Build and run locally

From the project root:

# Build
docker build -f Dockerfile.serve -t workshop-sst-serve:local .

# Run
docker run -p 8000:8000 workshop-sst-serve:local

Exercise the API

# Health
curl http://localhost:8000/health

# Model metadata (version, training run, etc.)
curl http://localhost:8000/model-info

# Predict
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[0.5, 0.3, 0.2, 0.1, 0.4, 0.3, 0.2]]}'

Pull and run from GHCR

Once the image is published to GHCR, any collaborator can skip the build step:

docker pull ghcr.io/chicago-aiscience/workshop-sst-serve:latest
docker run -p 8000:8000 ghcr.io/chicago-aiscience/workshop-sst-serve:latest

Swap in a different model without rebuilding

The container reads the model from /app/model/model.joblib. Mount your own file over that path and the same image now serves a different model:

docker run -p 8000:8000 \
  -v /path/to/model.joblib:/app/model/model.joblib:ro \
  ghcr.io/chicago-aiscience/workshop-sst-serve:latest

That makes the image a reusable serving shell with one container and any number of models. Pair it with DVC to pull a specific model version by hash, then mount it into the running container.

Other options

Pros and cons

ProsCons
Single-command handoff: docker run and you have a prediction APIConsumers need Docker installed
Image is self-contained with no Python environment to manageContainer images can be large (hundreds of MB)
Mount-based model swap enables one-image-many-models servingBuilding and publishing adds a CI step
GHCR integrates cleanly with GitHub access controlPrivate images require authentication to pull

Reference