Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Jupyter notebooks

Jupyter notebooks as trackers

A Jupyter notebook is a perfectly good experiment tracker for early-stage exploration which can include a handful of model variants, a single dataset, and results that you don’t need to track long term. You don’t need to start with more advanced experiment tracking like MLflow or Weights & Biases.

When a notebook is enough

Reach for a notebook (and skip the heavier tools) when:

If two of those stop being true, graduate to MLflow or Weights & Biases.

What to record inline

Treat the notebook like a lab notebook. Alongside the code, capture the things you’d otherwise forget:

Practical hygiene

A few habits that make notebooks much more reliable as a record:

Where notebooks fall short

Worth knowing the limits, so you can spot when it’s time to switch:

If you find yourself building scripts to parse metrics out of old notebooks, or copying parameters between notebooks by hand, that’s the signal to move to MLflow (solo / local) or Weights & Biases (team / cloud).

Starter template

The template below wires in every habit on this page: top-of-notebook question and conclusions block, a single PARAMS dict, explicit seeds, “what surprised me” cells, and an append-only structure for new experiments. Copy it into a new .ipynb file to get started, or download the notebook.

The example uses the workshop-sst pipeline (sst.io, sst.transform, sst.ml), so run it from a clone of that repo with the sst package installed and data/sst_sample.csv and data/nino34_sample.csv present.

experiment-template.py
# %% [markdown]
# # Experiment: <short descriptive title>
#
# **Date:** 2026-04-15
# **Author:** <your name>
# **Notebook file:** `2026-04-15-<topic>.ipynb`
# **Reference codebase:** [chicago-aiscience/workshop-sst](https://github.com/chicago-aiscience/workshop-sst)
#
# ---
#
# ## Question(s)
#
# > State the question this notebook is trying to answer, *before* you start.
# >
# > *Does increasing the number of lag features (3 → 6) improve the Random Forest's
# > ability to predict the Niño 3.4 index from SST?*
#
# ## Summary of conclusions
#
# > Fill this in **after** running the experiments.
# >
# > *Example: On the sample dataset, n_lags=3 and n_lags=6 produced nearly identical
# > test R² (~0.975) — the extra lags neither helped nor hurt meaningfully.*
#
# ## Parameters
#
# | Parameter | Value |
# |---|---|
# | Random seed | 42 |
# | Train/test split | 0.8 / 0.2 (chronological) |
# | Model | `RandomForestRegressor` (sklearn, via `sst.ml`) |
# | Feature column | `sst_c_roll_12` |
# | Target column | `nino34_roll_12` |
# | `n_lags` | 3 (baseline), 6 (variant) |
#
# ## Data
#
# - **Source:** `data/sst_sample.csv`, `data/nino34_sample.csv`
# - **Version / commit:** record the DVC pointer or git commit hash here

# %% [markdown]
# ## Setup
#
# Pin every source of randomness and collect parameters in one place.
# If you change a parameter, change it *here* — don't sprinkle literals
# through the notebook.

# %%
import random
from pathlib import Path

import numpy as np

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

PARAMS = {
    "seed": SEED,
    "test_size": 0.2,
    "feature_col": "sst_c_roll_12",
    "target_col": "nino34_roll_12",
    "n_lags_baseline": 3,
    "n_lags_variant": 6,
    "sst_path": Path("data/sst_sample.csv"),
    "enso_path": Path("data/nino34_sample.csv"),
}
PARAMS

# %% [markdown]
# ## Data

# %%
from sst.io import load_sst, load_enso
from sst.transform import tidy, join_on_month

sst_df = tidy(load_sst(PARAMS["sst_path"]), date_col="date", value_col="sst_c")
enso_df = tidy(load_enso(PARAMS["enso_path"]), date_col="date", value_col="nino34")
joined = join_on_month(sst_df, enso_df)

print(f"Joined shape: {joined.shape}")
print(f"Date range:   {joined['date'].min().date()} → {joined['date'].max().date()}")
joined.head()

# %% [markdown]
# ## Experiment 1 — Baseline (n_lags = 3)
#
# **Hypothesis:** Three months of lag features should capture most of the
# short-term autocorrelation in the Niño 3.4 index. This is the workshop default.

# %%
from sst.ml import predict_enso_from_sst

baseline = predict_enso_from_sst(
    joined,
    target_col=PARAMS["target_col"],
    feature_col=PARAMS["feature_col"],
    test_size=PARAMS["test_size"],
    n_lags=PARAMS["n_lags_baseline"],
    random_state=PARAMS["seed"],
)

print(f"Baseline (n_lags={PARAMS['n_lags_baseline']})")
print(f"  R²:   {baseline['r2_score']:.4f}")
print(f"  RMSE: {baseline['rmse']:.4f}")
print("\nTop features:")
baseline["feature_importance"].head()

# %% [markdown]
# ### What surprised me
#
# > One or two sentences: what worked, what didn't, what you'd try next.

# %% [markdown]
# ## Experiment 2 — More lags (n_lags = 6)
#
# **Hypothesis:** Six months of lags should capture seasonal structure that
# 3 months misses. On a small sample dataset this might overfit instead.

# %%
variant = predict_enso_from_sst(
    joined,
    target_col=PARAMS["target_col"],
    feature_col=PARAMS["feature_col"],
    test_size=PARAMS["test_size"],
    n_lags=PARAMS["n_lags_variant"],
    random_state=PARAMS["seed"],
)

print(f"Variant (n_lags={PARAMS['n_lags_variant']})")
print(f"  R²:   {variant['r2_score']:.4f}")
print(f"  RMSE: {variant['rmse']:.4f}")
print("\nTop features:")
variant["feature_importance"].head()

# %% [markdown]
# ### What surprised me
#
# > One or two sentences: what worked, what didn't, what you'd try next.

# %% [markdown]
# ## Comparison
#
# Print the metrics (searchable) **and** plot predictions vs. actual (scannable).
# Keep both inline — don't write the figure out to `figures/`.

# %%
import matplotlib.pyplot as plt

results = {
    f"Baseline (n_lags={PARAMS['n_lags_baseline']})": baseline,
    f"Variant  (n_lags={PARAMS['n_lags_variant']})": variant,
}

for name, r in results.items():
    print(f"{name}: R²={r['r2_score']:.4f}  RMSE={r['rmse']:.4f}")

fig, axes = plt.subplots(1, 2, figsize=(11, 4), sharey=True)
for ax, (name, r) in zip(axes, results.items()):
    preds = r["predictions"]
    ax.plot(preds["date"], preds["actual"], label="Actual", linewidth=2)
    ax.plot(preds["date"], preds["predicted"], label="Predicted", linestyle="--")
    ax.set_title(f"{name}\nR²={r['r2_score']:.3f}")
    ax.set_xlabel("Date")
    ax.legend()
axes[0].set_ylabel("Niño 3.4 (12-mo rolling)")
fig.autofmt_xdate()
plt.tight_layout()
plt.show()

# %% [markdown]
# ## Conclusions
#
# > Mirror the **Summary of conclusions** at the top, but with more detail.
# > What did you learn? What would you do next? What would you do differently?

# %% [markdown]
# ---
# ## Appendix: new experiments go below this line
#
# Append new experiments as `## Experiment 3 — ...` with the same
# hypothesis → code → "what surprised me" pattern. Don't overwrite cells above.