Skip to content

sst.cli

Command-line interface for running the SST ETL workflow.

run(sst=Path('data/sst_sample.csv'), enso=Path('data/nino34_sample.csv'), out_dir=Path('artifacts'), start='2000-01')

Run the SST ETL workflow end-to-end.

Parameters:

Name Type Description Default
sst Path

Location of the SST CSV file to ingest.

"data/sst_sample.csv"
enso Path

Location of the ENSO index CSV file to ingest.

"data/nino34_sample.csv"
out_dir Path

Directory where generated summary artifacts are written.

"artifacts"
start str

Earliest date to retain after joining the SST and ENSO data. Parsed to a timestamp via :func:pandas.to_datetime.

"2000-01"

Returns:

Type Description
None

Writes a metrics CSV and trend plot to out_dir and prints their locations.

Source code in src/sst/cli.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
@app.command("run")
def run(
    sst: Path = Path("data/sst_sample.csv"),
    enso: Path = Path("data/nino34_sample.csv"),
    out_dir: Path = Path("artifacts"),
    start: str = "2000-01",
) -> None:
    """Run the SST ETL workflow end-to-end.

    Parameters
    ----------
    sst : pathlib.Path, default="data/sst_sample.csv"
        Location of the SST CSV file to ingest.
    enso : pathlib.Path, default="data/nino34_sample.csv"
        Location of the ENSO index CSV file to ingest.
    out_dir : pathlib.Path, default="artifacts"
        Directory where generated summary artifacts are written.
    start : str, default="2000-01"
        Earliest date to retain after joining the SST and ENSO data. Parsed
        to a timestamp via :func:`pandas.to_datetime`.

    Returns
    -------
    None
        Writes a metrics CSV and trend plot to ``out_dir`` and prints their
        locations.
    """

    out_dir.mkdir(parents=True, exist_ok=True)

    sst_df = tidy(load_sst(sst), date_col="date", value_col="sst_c", roll=12)
    enso_df = tidy(load_enso(enso), date_col="date", value_col="nino34", roll=12)

    joined = join_on_month(sst_df, enso_df, start=start)

    summary = metrics(joined)
    (out_dir / "summary.csv").write_text(summary.to_csv(index=False))

    fig = make_trend_plot(joined)
    fig.savefig(out_dir / "trends.png", dpi=150, bbox_inches="tight")

    fig = make_corr_plot(joined)
    fig.savefig(out_dir / "scatter_plot.png", dpi=150, bbox_inches="tight")
    print(
        f"Wrote {out_dir / 'summary.csv'} and {out_dir / 'trends.png'} and {out_dir / 'correlation.png'}"
    )