Parallel Inference on Slurm Clusters¶
CryoSiam inference commands (e.g. denoising, semantic segmentation, instance segmentation, embeddings) are executed on a single GPU per command-line call.
When processing many tomograms, the recommended way to scale inference on HPC systems is to parallelize over tomograms, running one tomogram per Slurm job.
This page explains how to do this safely and efficiently using Slurm job arrays or batched submissions.
Key idea¶
- Each CryoSiam inference command uses 1 GPU
- Parallelism is achieved by:
- Splitting tomograms by filename
- Submitting one Slurm job per tomogram
- The
--filenameargument is used to restrict each job to a single tomogram
This approach avoids GPU contention and scales linearly with the number of available GPUs.
Typical use cases¶
- Denoising a large dataset
- Semantic segmentation over many tomograms
- Instance segmentation at scale
- Subtomogram embeddings generation
Step 1: Prepare a submission script¶
Below is an example Slurm submission script that runs instance segmentation on a single tomogram passed as a command-line argument.
Save this as run_instance_predict.sbatch:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --mem=50G
#SBATCH --time=1-00:00:00
#SBATCH --gres=gpu:1
#SBATCH --job-name="cryosiam_instance_predict"
#SBATCH -o /scratch/dataset/slurm.%N.%j.out
#SBATCH -e /scratch/dataset/slurm.%N.%j.err
echo "Starting CryoSiam instance prediction"
echo "Tomogram: $1"
cd /scratch/dataset/scripts
conda activate cryosiam
cryosiam instance_predict \
--config_file=config_dense_simsiam_instance.yaml \
--filename=$1
echo "Done."
Step 2: Submit jobs for all tomograms in a folder¶
Assuming all tomograms are located in:
/scratch/dataset/data/
and have extension .mrc, you can submit one job per tomogram using:
for f in /scratch/dataset/data/*.mrc; do
fname=$(basename "$f")
sbatch run_instance_predict.sbatch "$fname"
done
Job arrays (recommended for large datasets)¶
If you have hundreds to thousands of tomograms, a Slurm job array is often cleaner than submitting many individual jobs.
The idea is:
- Create a text file containing one filename per line (e.g.
tomograms.txt) - Submit an array job where each task processes one line from that file
- Use
--array=0-(N-1)to match the number of tomograms
Step 1: Create a file list (tomograms.txt)¶
From a folder containing .mrc tomograms:
ls -1 /scratch/dataset/data/*.mrc | xargs -n 1 basename > tomograms.txt
This produces a plain text file like:
TS_01.mrc
TS_02.mrc
TS_03.mrc
...
Step 2: Array sbatch script¶
Save this as run_instance_predict_array.sbatch:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --mem=50G
#SBATCH --time=1-00:00:00
#SBATCH --gres=gpu:1
#SBATCH --job-name="cryosiam_instance_predict"
#SBATCH -o /scratch/dataset/slurm.%A_%a.out
#SBATCH -e /scratch/dataset/slurm.%A_%a.err
#SBATCH --array=0-99
set -euo pipefail
LIST_FILE=/scratch/dataset/scripts/tomograms.txt
TOMO=$(sed -n "$((SLURM_ARRAY_TASK_ID+1))p" "$LIST_FILE")
echo "Array task: $SLURM_ARRAY_TASK_ID"
echo "Tomogram: $TOMO"
cd /scratch/dataset/scripts
conda activate cryosiam
cryosiam instance_predict \
--config_file=config_dense_simsiam_instance.yaml \
--filename="$TOMO"
echo "Done."
Important: replace --array=0-99 with the correct range for your dataset.
For example, if tomograms.txt has 237 lines, use --array=0-236.
Step 3: Submit the array job¶
# Count tomograms
N=$(wc -l < tomograms.txt)
# Submit array using 0..N-1
sbatch --array=0-$((N-1)) run_instance_predict_array.sbatch
(Optional) Limit the number of concurrent GPUs¶
Many clusters allow a concurrency cap:
sbatch --array=0-$((N-1))%20 run_instance_predict_array.sbatch
This example runs at most 20 jobs at once, preventing overwhelming the scheduler or filesystem.
Adapting job arrays to other modules¶
Replace the command in the script with the module you need, for example:
- Denoising:
cryosiam denoise_predict --config_file=config_denoising.yaml --filename="$TOMO" - Semantic segmentation:
cryosiam semantic_predict --config_file=config_semantic.yaml --filename="$TOMO" - Embeddings (per tomogram):
cryosiam simsiam_embeddings_predict --config_file=config_subtomo_embeddings.yaml --filename="$TOMO"
Best practices¶
- Use one GPU per job
- Avoid running multiple CryoSiam jobs on the same GPU
- Monitor jobs with
squeue -u $USER - Adjust memory and time requests depending on tomogram size