Setup Guide for Running Benchmark on WatGPU
Prerequisites
- Access to WATGPU cluster (UWaterloo CS department)
Installation Guide
1. Connect to WATGPU Server
ssh <your_username>@watgpu.cs.uwaterloo.ca
Note: Replace
<your_username>with your UWaterloo username.
2. Clone the Repository
# Navigate to home directory (REQUIRED - must be in HOME directory)
cd ~
# Clone the repository
git clone https://github.com/Blood-Glucose-Control/nocturnal-hypo-gly-prob-forecast.git
3. Set Up Python Environment
# Enter project directory
cd nocturnal-hypo-gly-prob-forecast
# Create virtual environment with Python 3.11
python3.11 -m venv .noctprob-venv
# Activate the virtual environment
source .noctprob-venv/bin/activate
Note: The server comes with a base conda environment that includes Python 3.11.
4. Install Dependencies
# Install required packages
pip install -r requirements.txt
# Install the project package in development mode
pip install -e .
Job Submission Guidelines
⚠️ IMPORTANT: Server Usage Policy
- **NEVER RUN SCRIPTS DIRECTLY ON THE WATGPU LOGIN SERVER**
- The login server is for job submission only
- All script execution must use `sbatch`
- Reference: [How to submit a job](https://watgpu.cs.uwaterloo.ca/slurm.html)
Project Structure
All scripts are located at ~/nocturnal-hypo-gly-prob-forecast/scripts/watgpu_slurm/:
Key files:proper resource allocation
Job Submission Process
1. Configure job.sh
Resource and YAML Configuration:
declare -A job_specs=(
["0_naive_05min.yaml"]="1 4 02:00:00"
["0_naive_15min.yaml"]="1 3 02:00:00"
)
Format:
[yaml_file]="cores memory(GB) time(HH:MM:SS)"Note: Queue time limit is 7 days maximum
Email Notification:
email="your.email@example.com"
Run Description:
description="This run evaluates the impact of removing exogenous variables (IOB and COB)
to determine if there is any performance degradation compared to baseline."
Add a clear explanation of:
- The purpose of this run
- Why you're running this experiment
- Key changes from previous runs
2. Submit the Job
cd ~/nocturnal-hypo-gly-prob-forecast/scripts/watgpu/
bash batch.sh
Submitted batch job 12345)
Results Location
Log Files:
- Located in `scripts/watgpu/`
- `JOB<jobid>.out`: Standard output
- `JOB<jobid>.err`: Error messages
Results Directory:
Check results/processed/ for a timestamped folder containing:
- Configuration details
- Performance metrics from different scorers
- Folder name includes run timestamp
SLURM Reference Guide
Resource Monitoring
CPU Status:
sinfo -o "%C"
- A: Allocated (in use)
- I: Idle (available)
- O: Other (down/maintenance)
- T: Total CPUs
GPU Status:
sinfo -o "%n %G"
Memory Status:
sinfo -o "%n %m"
Job Management
View Your Jobs:
# Basic job status
squeue -u $USER
# Detailed job information
squeue -o "%.18i %.9P %.15j %.8u %.2t %.10M %.6D %C %.6m" | grep $USER
Shows: JobID, Partition, JobName, User, State, Time, Nodes, CPUs, Memory
Control Jobs:
# Cancel a specific job
scancel <jobid>
# Cancel all your jobs
scancel -u $USER