Running Jobs
Learn how to submit, monitor, and manage jobs on REPACSS using SLURM.
Job Types
Interactive vs Batch Jobs
Interactive Jobs
interactive -c 8 -p h100
Batch Jobs
sbatch job.sh
sbatch -p zen4 job.sh
sbatch -p h100 job.sh
Job Scripts
Script Templates
Basic Template
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=test.out
#SBATCH --error=test.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
# Load modules
module load
# Run program
./my_program Python Template
#!/bin/bash
#SBATCH --job-name=python_job
#SBATCH --output=python_job.out
#SBATCH --error=python_job.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
# Load required modules
module load gcc
# Activate conda environment
source ~/miniforge3/etc/profile.d/conda.sh
conda activate myenv
# Run Python script
python script.py GPU Template
#!/bin/bash
#SBATCH --job-name=gpu_test
#SBATCH --output=gpu_test.out
#SBATCH --error=gpu_test.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
# Load modules
module load cuda
# Run program
./gpu_program Job Management
Submission
sbatch job.sh - Submit jobsbatch --array=1-10 job.sh - Array jobssbatch --dependency=afterok:12345 job.sh - Job dependenciesMonitoring
squeue -u $USER - Your jobssqueue -p zen4 - Zen4 partitionsqueue -p h100 - H100 partitionControl
scancel 12345 - Cancel specific jobscancel -u $USER - Cancel all your jobsscancel -p zen4 - Cancel partition jobsResource Requests
Resource Types
- CPU Jobs: Use
--nodes,--ntasks,--cpus-per-task, and--mem - GPU Jobs: Add
--gres=gpu:1(orgpu:2,gpu:4) - Python Jobs: See Python Environment Setup for specific configurations
- Consider using
--cpus-per-taskfor parallel Python processing - Adjust
--membased on your data processing needs