Running Jobs (SLURM)

These patterns apply to both RCC and DSI because both use SLURM. These templates are designed to be copied into your project and edited.

When to use interactive vs batch jobs¶

Use case	Recommended
Debugging code	`srun`
Testing environments	`srun`
Exploratory analysis	`srun`
Long-running jobs	`sbatch`
Production workflows	`sbatch`

`sbatch` Batch job submissions¶

Before submitting:

Create a logs/ directory (and any output directories your code expects)
Update #SBATCH --partition=... to a partition you can access
Request the minimum resources you need so jobs start sooner

GPU job (batch)¶

Request the minimum time/memory/GPU you need
Smaller requests often start sooner

Submit:

sbatch scripts/gpu.sbatch

You may want to determine what partitions you can run on by reviewing a list of available partitions on the cluster:

sinfo -a

(RCC partitions: https://docs.rcc.uchicago.edu/partitions/)

File contents:

gpu.sbatch

#!/bin/bash
# NOTE: Consider enabling SLURM email notifications (see Appendix: SLURM Email Notifications)
#SBATCH --job-name=gpu_example
##SBATCH --account=pi-account    # <-- change to an allowed account on your cluster - RCC CLUSTER ONLY (uncomment if needed)
#SBATCH --partition=general      # <-- change to an allowed GPU partition on your cluster
#SBATCH --gres=gpu:1             # <-- change request 1 GPU (adjust as needed)
##SBATCH --gres=local:200G       # <-- Request node local storage - DSI CLUSTER ONLY (uncomment if needed)
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --time=02:00:00
#SBATCH --output=/path/to/logs/%x_%j.out
#SBATCH --error=/path/to/logs/%x_%j.err

# ===============================
# GPU JOB TEMPLATE (ANNOTATED)
# ===============================
# Tips:
# - Request the minimum resources you need (time/mem/CPU/GPU) to start sooner.
# - On RCC compute nodes, outbound internet is typically blocked.
# - Create a logs/ directory (and any output dirs) before submitting.
# - Uncomment required steps to execute script to customize to your usage.

set -euo pipefail

# 1) Load modules (adjust to your software stack)
# module purge
# module load cuda/12.1  # example; check `module avail` on your cluster

# 2) Activate your environment
# Recommended: keep envs in your project/scratch, not in $HOME if large.
# source /path/to/venv/bin/activate
# OR for conda/mamba:
# source ~/.bashrc
# conda activate myenv

# 3) (Optional) Use node-local scratch for high I/O temporary files
# SLURM may set $TMPDIR / $SLURM_TMPDIR on some clusters.
# If set, it is FAST but DELETED when the job ends.
WORKDIR="${SLURM_TMPDIR:-/local/scratch}/${USER}_${SLURM_JOB_ID}"
mkdir -p "$WORKDIR"
echo "Working directory: $WORKDIR"

# 4) Copy inputs to node-local storage (optional)
# cp -r /path/to/input "$WORKDIR/"

# 5) Run your workload
# Example: Python training script (replace with your command)
python -u /home/ntebaldi/user-guide/gpu/gpu.py \
  --epochs 5 \
  --batch-size 64 \
  --outdir "${WORKDIR}/run_${SLURM_JOB_ID}"

# 6) Copy outputs back to persistent storage if you used node-local scratch
# rsync -av "$WORKDIR/" /scratch/midways3/$USER/somewhere/

# 7) Deactivate your environment
# deactivate
# conda deactivate myenv

echo "Done."

See this repository directory for full example: https://github.com/chicago-aiscience/chicago-aiscience.github.io/tree/main/docs/user_guide/scripts/gpu

Job arrays¶

Run many similar jobs over different inputs. Reference: https://slurm.schedmd.com/job_array.html

Submit:

sbatch scripts/array.sbatch

File contents:

array.sbatch

#!/bin/bash
# NOTE: Consider enabling SLURM email notifications (see Appendix: SLURM Email Notifications)
#SBATCH --job-name=array_example
##SBATCH --account=<PI_ACCOUNT>        # <-- change to an allowed account on your cluster ; RCC CLUSTER ONLY (uncomment if needed)
#SBATCH --partition=<PARTITION>         # <-- change to an allowed GPU partition on your cluster
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
##SBATCH --gres=local:200G              # <-- Request node local storage - DSI CLUSTER ONLY (uncomment if needed)
#SBATCH --time=00:30:00
#SBATCH --array=0-9                     # 10 tasks: indices 0..9
#SBATCH --output=/path/to/logs/%x_%A_%a.out
#SBATCH --error=/path/to/logs/%x_%A_%a.err

# =====================================
# JOB ARRAY TEMPLATE (ANNOTATED)
# =====================================
# Use job arrays when you have many similar tasks over different inputs:
# - parameter sweeps
# - per-sample preprocessing
# - independent simulations
#
# Key environment variables:
# - SLURM_ARRAY_JOB_ID  (the parent job ID)
# - SLURM_ARRAY_TASK_ID (the index for this array task)

set -euo pipefail

TASK_ID="${SLURM_ARRAY_TASK_ID}"
echo "Array task: ${TASK_ID}"

# Option A: Pass input file directly to Python script
# The Python script will use SLURM_ARRAY_TASK_ID to select which element to process
MANIFEST="/path/to/your/input.json"
echo "Input file: ${MANIFEST}"
echo "Array task ID: ${TASK_ID}"

# Run your program
python -u /path/to/your/array.py --input "$MANIFEST" --output "/path/to/your/results/out_${TASK_ID}.txt"

# Option B: Map task IDs to input file elements via a manifest (alternative approach)
# MANIFEST="inputs.txt"
# INPUT=$(sed -n "$((TASK_ID+1))p" "$MANIFEST")  # +1 because sed is 1-indexed
# echo "Input for this task: ${INPUT}"
# python -u array.py --input "$INPUT" --output "outputs/out_${TASK_ID}.txt"

# Option C: Map task IDs to parameters (example)
# PARAM=$(python - <<'PY'
# import os
# tid = int(os.environ["SLURM_ARRAY_TASK_ID"])
# print([0.1,0.2,0.5,1.0,2.0][tid])
# PY
# )
# echo "Param: $PARAM"

echo "Done."

See this repository directory for full example: https://github.com/chicago-aiscience/chicago-aiscience.github.io/tree/main/docs/user_guide/scripts/array

Multi-node MPI jobs¶

Run across nodes in the cluster. Reference: https://docs.open-mpi.org/en/main/launching-apps/slurm.html

Submit:

sbatch scripts/mpi.sbatch

RCC Cluster submission file contents:

mpi-rcc.sbatch

#!/bin/bash
#SBATCH --account=<PI_ACCOUNT>    # <-- change to an allowed account
#SBATCH --job-name=mpi_example
#SBATCH --partition=<PARTITION>    # <-- change to an allowed partition
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=1
#SBATCH --mem=0
#SBATCH --time=01:00:00
#SBATCH --output=/path/to/your/project/logs/%x_%j.out    # <-- change to your project's logs directory
#SBATCH --error=/path/to/your/project/logs/%x_%j.err    # <-- change to your project's logs directory

set -euo pipefail

# ---- User paths ----
SCRIPT="/path/to/your/project/mpi.py"
INPUT="/path/to/your/project/mpi_input.txt"
LOGS="/path/to/your/project/logs"
RESULTS="/path/to/your/project/results"
VENV="/path/to/your/project/.venv"

mkdir -p "$LOGS" "$RESULTS"

# ---- Runtime settings ----
export PYTHONUNBUFFERED=1

# Open MPI transport preferences (Midway3)
# - Try UCX first; keep TCP + shared memory as fallback
export OMPI_MCA_pml=ucx
export OMPI_MCA_btl=self,vader,tcp
export OMPI_MCA_btl_tcp_if_include=ib0

# ---- Environment ----
module load openmpi/4.1.8
source "${VENV}/bin/activate"

echo "Nodes allocated:"
scontrol show hostnames "$SLURM_NODELIST"
echo "Total tasks: ${SLURM_NTASKS} (ntasks-per-node: ${SLURM_NTASKS_PER_NODE:-unknown})"

# Hostfile from Slurm allocation (cleaned up on exit)
HOSTFILE="$(mktemp "${SLURM_SUBMIT_DIR:-/tmp}/hostfile.${SLURM_JOB_ID}.XXXX")"
trap 'rm -f "$HOSTFILE"' EXIT
scontrol show hostnames "$SLURM_NODELIST" > "$HOSTFILE"

# Optional debug: sbatch --export=ALL,DEBUG=1 mpi.sbatch
DEBUG="${DEBUG:-0}"
if [[ "$DEBUG" == "1" ]]; then
  echo "Hostfile: $HOSTFILE"
  cat "$HOSTFILE"
  echo "Interfaces per node:"
  srun -N "$SLURM_JOB_NUM_NODES" -n "$SLURM_JOB_NUM_NODES" --ntasks-per-node=1 \
    bash -lc 'echo HOST=$(hostname); ip -o -4 addr show | awk "{print \$2,\$4}"'
  echo "mpi4py linked against:"
  python -c "import mpi4py.MPI as M; print(M.Get_library_version())"
fi

# ---- Launch (Open MPI via mpirun; works around Slurm PMI/PMIx limitations) ----
mpirun -np "$SLURM_NTASKS" \
  --hostfile "$HOSTFILE" \
  --map-by "ppr:${SLURM_NTASKS_PER_NODE}:node" \
  --bind-to none \
  --tag-output --timestamp-output \
  python -u "$SCRIPT" \
    --input "$INPUT" \
    --output "${RESULTS}/out_${SLURM_JOB_ID}.txt"

# ---- Deactivate environment ----
# conda deactivate myenv
deactivate
echo "Done."

DSI Cluster submission file contents:

mpi-dsi.sbatch

#!/bin/bash
#SBATCH --job-name=mpi_example
#SBATCH --partition=general
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=1
#SBATCH --mem=0
#SBATCH --time=01:00:00
#SBATCH --output=/path/to/your/project/logs/%x_%j.out
#SBATCH --error=/path/to/your/project/logs/%x_%j.err

set -euo pipefail

# Optional debug: sbatch --export=ALL,DEBUG=1 mpi.sbatch
DEBUG="${DEBUG:-0}"

# ---- Paths ----
SCRIPT="/path/to/your/project/mpi.py"
INPUT="/path/to/your/project/mpi_input.txt"
LOGS="/path/to/your/project/logs"
RESULTS="/path/to/your/project/results"
VENV="/path/to/your/project/.venv"
MPI_TYPE="pmix_v3"

mkdir -p "$LOGS" "$RESULTS"

# ---- Runtime settings ----
export PYTHONUNBUFFERED=1

# Force Open MPI to use TCP over a clean interface set (avoid docker/loopback bridges)
export OMPI_MCA_btl=self,tcp
export OMPI_MCA_btl_tcp_if_exclude=lo,docker0,virbr0
# Conservative PML (avoid UCX surprises on this cluster)
export OMPI_MCA_pml=ob1

# ---- Environment ----
# module purge
# module load openmpi/<version>
source "${VENV}/bin/activate"

echo "Nodes allocated:"
scontrol show hostnames "$SLURM_NODELIST"
echo "Total tasks: ${SLURM_NTASKS} (ntasks-per-node: ${SLURM_NTASKS_PER_NODE:-unknown})"
echo "DEBUG=${DEBUG}  MPI_TYPE=${MPI_TYPE}"

if [[ "$DEBUG" == "1" ]]; then
  echo "Interfaces per node:"
  srun -N "$SLURM_JOB_NUM_NODES" -n "$SLURM_JOB_NUM_NODES" --ntasks-per-node=1 \
    bash -lc 'echo HOST=$(hostname); ip -o -4 addr show | awk "{print \$2,\$4}"'
  echo "mpi4py linked against:"
  python -c "import mpi4py.MPI as M; print(M.Get_library_version())"
fi

# ---- Launch ----
SRUN_ARGS=(--mpi="${MPI_TYPE}" --label --kill-on-bad-exit=1)

if [[ "$DEBUG" == "1" ]]; then
  SRUN_ARGS+=(--output="${LOGS}/%x_%j_%t.out" --error="${LOGS}/%x_%j_%t.err")
fi

srun "${SRUN_ARGS[@]}" \
  python -u "$SCRIPT" \
    --input "$INPUT" \
    --output "${RESULTS}/out_${SLURM_JOB_ID}.txt"

deactivate
echo "Done."

See this repository directory for full example: https://github.com/chicago-aiscience/chicago-aiscience.github.io/tree/main/docs/user_guide/scripts/mpi

`srun` Interactive jobs¶

Interactive jobs let you run commands directly on a compute node instead of submitting a batch script. They are ideal for:

Debugging jobs that fail in batch mode
Testing software environments and modules
Running short experiments or exploratory analyses
Developing code before scaling up to batch jobs

srun: direct interactive jobs (recommended), the preferred and most flexible way to start an interactive session.

Basic interactive CPU job¶

srun --partition=general --nodes=1 --ntasks=1 --cpus-per-task=2 --mem=4G --time=01:00:00 --pty bash -i

Use this to:

Activate virtual environments
Compile code
Run small test cases
Verify file paths and permissions

Interactive GPU job¶

srun --partition=schmidt-gpu --gres=gpu:1 --cpus-per-task=4 --mem=32G --time=01:30:00 --pty bash

Verify GPU access:

nvidia-smi

Attach to a running job¶

srun --jobid=<JOBID> --pty bash

Useful for inspecting logs or diagnosing issues.

`sinteractive`: convenience wrapper (RCC only)¶

sinteractive is an RCC-provided wrapper around srun.

Example:

sinteractive

With options:

sinteractive --mem=8G --time=01:00:00

Notes:

RCC-specific (not available on DSI)
Defaults may request more resources than intended
Prefer srun for clarity and reproducibility

Open OnDemand (RCC only)¶

Open OnDemand (OOD) is a web-based interface to the RCC clusters. It provides an alternative to the command line for common tasks and is especially useful for visualization, notebooks, and interactive workflows.

RCC documentation: https://docs.rcc.uchicago.edu/open_ondemand/open_ondemand/

What Open OnDemand is good for

Use Open OnDemand when you want to:

Browse and manage files through a web interface
Launch interactive desktops on compute nodes
Run Jupyter notebooks without setting up port forwarding
Use GUI-based tools (e.g., visualization, IDEs) on cluster hardware

Open OnDemand still submits jobs through SLURM—it does not bypass scheduling or resource limits.

Accessing Open OnDemand

Go to: https://midway3-ondemand.rcc.uchicago.edu
Log in with your UChicago credentials
Choose an app or interactive session from the menu

Open OnDemand vs srun and sbatch

Task	Recommended
Debugging via terminal	`srun`
Jupyter notebooks	Open OnDemand
Interactive desktop / GUI apps	Open OnDemand
Scripted, repeatable workflows	`sbatch`
Lightweight exploration	Either

Important notes

Open OnDemand sessions:
- Run on compute nodes, not login nodes
- Count against your allocation
- Are subject to the same time and memory limits as srun
If a session ends or times out, unsaved work may be lost
For long or unattended jobs, always use sbatch

Common mistakes¶

Forgetting --time
Requesting excessive resources
Running large jobs interactively
Leaving interactive sessions idle

Software Modules¶

On RCC clusters (like Midway2 / Midway3) and to some extent the DSI cluster, most scientific software isn’t available in your shell by default.

Instead, packaged tools, languages, libraries, and compilers are managed through Environment Modules which is a system that lets you load, unload, and switch software versions cleanly.

A module is basically a script that sets up your environment (e.g., PATH, LD_LIBRARY_PATH) so a specific software package and its dependencies become available in your session.

You use the module command (module avail, module load, etc.) to interact with these.

The benefit: no conflicting software versions, easy switching between versions, and reproducibility across compute sessions.

Modules may also provide:

Libraries (e.g., FFTW, MKL) used by other software
Developer tools (CMake, debugger/profilers)
Language environments (Perl, Java)

These support both building complex codes and running them smoothly on compute nodes.

How to run software modules¶

See what’s available:

module avail

Load what you need:

module load <software>/<version>

Check what’s loaded:

module list

Run your code inside a job script or interactive session.

Monitoring Jobs¶

Once you submit a job, Slurm provides several commands to help you track its status, priority, and resource usage on RCC clusters.

Check Job Status: `squeue`¶

View your running and pending jobs:

squeue -u $USER

Key fields include job ID, state (PD = pending, R = running), elapsed time, and node or pending reason.

For a custom view:

squeue -u $USER -o "%.18i %.9P %.8j %.2t %.10M %.6D %R"

Check Job Priority: `sprio`¶

If a job is pending, use sprio to understand its scheduling priority:

sprio -u $USER

Only prints output for pending jobs. This shows how factors like job age, fairshare, and job size affect when your job will start.

Check Job Efficiency: `seff`¶

After a job finishes, summarize how efficiently it used resources:

seff <JOBID>

Use CPU and memory efficiency to adjust future job requests.

When to use interactive vs batch jobs¶

sbatch Batch job submissions¶

GPU job (batch)¶

Job arrays¶

Multi-node MPI jobs¶

srun Interactive jobs¶

Basic interactive CPU job¶

Interactive GPU job¶

Attach to a running job¶

sinteractive: convenience wrapper (RCC only)¶

Open OnDemand (RCC only)¶

Common mistakes¶

Software Modules¶

How to run software modules¶

Monitoring Jobs¶

Check Job Status: squeue¶

Check Job Priority: sprio¶

Check Job Efficiency: seff¶

`sbatch` Batch job submissions¶

`srun` Interactive jobs¶

`sinteractive`: convenience wrapper (RCC only)¶

Check Job Status: `squeue`¶

Check Job Priority: `sprio`¶

Check Job Efficiency: `seff`¶