When you train a model, it is worth pausing to ask whether — six months from now — you could answer:
Which data was used?
Which version of the code ran?
What parameters were set?
Which artifacts were produced?
Can I reproduce the results?
Without tracking, those answers live in memory, filenames, or convention, and diverge quickly across experiments and collaborators. The pages in this section describe a progressive stack: start with Git, and layer on additional tools as your experiments grow in scope.
What to track¶
| Layer | Tool | When |
|---|---|---|
| Code | Git / GitHub | Always |
| Data | Data Version Control (DVC) | Files over 100 MB, or data that changes across runs |
| Models & experiments | MLflow or Weights & Biases | When you need to compare runs by parameters and metrics |
| Serving | Docker on GHCR, HuggingFace | When you want to hand a working model to collaborators or users |
Pages in this section¶
Git & GitHub: the baseline every other tool links against.
Notebooks: notebooks as the first line for experiment tracking.
MLflow: local experiment tracker with a UI and model registry.
Weights & Biases (W&B): cloud experiment tracker; good for distributed teams.
DVC: content-hashed versioning for large data and model files.
MLflow + DVC: combined workflow for full run reproducibility. The same pattern works with W&B.
Model serving: package a trained model behind an HTTP API with Docker and GHCR.
Decision tree¶
The fastest way to pick a stack is to walk through the questions below.
Recommendations by use case¶
| Use case | Recommended stack |
|---|---|
| Solo researcher, local development | Git (commit small models directly) + MLflow |
| Team collaboration, remote members | W&B (+ DVC if data over 100 MB) |
| Strict data lineage across runs | DVC with MLflow or Weights & Biases (W&B) |
| On-premise / no outbound network | MLflow + DVC with a local or shared-storage remote |
| Sharing a model with collaborators | Docker image on GHCR |
| Publishing a model publicly | HuggingFace |
Reference codebase¶
Every example on these pages comes from chicago-aiscience/workshop-sst — a working SST-to-ENSO prediction model with DVC, MLflow, W&B, and a Docker-served prediction API integrated end-to-end.