GPU Workload Orchestration
Schedule and manage GPU workloads across multiple nodes with PodWarden — AI inference, media transcoding, rendering, and more.
GPU Workload Orchestration
GPU workloads — AI inference, media transcoding, 3D rendering, computer vision — are increasingly common in homelabs and small teams. But scheduling GPU work across multiple machines is complex. PodWarden provides GPU-aware workload orchestration on K3s, making multi-GPU infrastructure manageable.
The GPU Scheduling Problem
Running GPU workloads on a single machine with Docker is straightforward: pass through the GPU device and run your container. But as soon as you have multiple GPU machines or multiple GPU workloads, problems emerge:
- Resource conflicts: Two workloads competing for the same GPU
- Manual placement: You decide which workload goes on which machine
- No resource tracking: No central view of GPU utilization across your fleet
- VRAM management: Workloads crash when they exceed available VRAM
- No failover: If a GPU node goes down, workloads don't reschedule
How PodWarden Handles GPU Workloads
Hardware Discovery
When you provision a host with PodWarden, it automatically discovers GPU hardware — NVIDIA GPUs, their models, VRAM capacity, and driver versions. This information is visible in the dashboard and used for scheduling decisions.
Resource-Aware Scheduling
PodWarden's workload definitions include GPU resource requests:
- GPU count: How many GPUs the workload needs
- VRAM request: Minimum VRAM required
When you deploy a GPU workload, PodWarden schedules it on a node with sufficient GPU resources. If no node has available GPUs, the workload queues until resources free up — no silent failures or resource conflicts.
GPU Templates in the Catalog
PodWarden's template catalog includes pre-configured GPU workloads:
| Application | GPU Use | Category |
|---|---|---|
| Ollama | LLM inference | AI |
| LocalAI | Multi-model inference | AI |
| Stable Diffusion WebUI | Image generation | AI |
| ComfyUI | Image generation workflows | AI |
| Plex | Hardware transcoding | Media |
| Jellyfin | Hardware transcoding | Media |
| Frigate NVR | Object detection | Surveillance |
| Whisper | Speech-to-text | AI |
Each template comes with appropriate GPU resource requests, NVIDIA runtime configuration, and environment variables pre-configured.
Mixed Workload Clusters
Most infrastructure runs a mix of GPU and non-GPU workloads. PodWarden handles this naturally — GPU workloads are scheduled on GPU nodes, everything else goes on general-purpose nodes. You don't need separate clusters for GPU and non-GPU work.
Example cluster topology:
| Node | GPUs | Workloads |
|---|---|---|
| node-1 (NUC) | None | Home Assistant, Pi-hole, PostgreSQL, Redis |
| node-2 (NAS) | None | Jellyfin (CPU), Nextcloud, Immich (CPU) |
| node-3 (Workstation) | RTX 4090 | Ollama, Stable Diffusion, Frigate NVR |
| node-4 (Server) | 2x RTX 3090 | Whisper, LocalAI, Plex (HW transcode) |
PodWarden schedules each workload on the appropriate node based on resource requirements. If node-3 is fully utilized, a new GPU workload waits or goes to node-4 if it has capacity.
Multi-GPU Use Cases
AI Inference Serving
Run multiple LLM models across GPU nodes. Ollama on one GPU for general chat, a specialized model on another for code generation. PodWarden ensures each gets dedicated GPU resources without conflicts.
Media Processing Pipeline
Ingest → transcode → serve: Frigate captures video and runs object detection on GPU, Plex or Jellyfin uses hardware transcoding for streaming, and PodWarden schedules all of it across your GPU-capable nodes.
Render Farm
Distribute Blender or other rendering jobs across multiple GPU nodes. PodWarden's workload definitions support job-type workloads that run to completion and release GPU resources.
Getting Started with GPU Workloads
- Install NVIDIA drivers on your GPU hosts (before PodWarden provisioning)
- Provision GPU hosts into PodWarden — GPU hardware is auto-discovered
- Create a cluster including your GPU nodes
- Deploy GPU workloads from the template catalog or custom definitions
- Monitor GPU utilization from the PodWarden dashboard
PodWarden configures the NVIDIA container runtime and Kubernetes device plugins during provisioning — you don't need to manually set up GPU passthrough for K3s.
Why K3s for GPU Workloads
K3s provides several advantages over plain Docker for GPU workload management:
- Resource scheduling: Workloads are placed on nodes with available GPUs automatically
- Health checks: GPU workloads that crash are restarted automatically
- Resource limits: Prevent workloads from consuming more GPU/VRAM than allocated
- Rolling updates: Update GPU workload images without downtime
- Multi-node: Distribute GPU workloads across your fleet from one control plane
PodWarden makes K3s GPU scheduling accessible without requiring deep Kubernetes knowledge — configure GPU requirements in the template or workload definition, and PodWarden handles the rest.
Infrastructure for Small Teams
PodWarden gives small development teams self-hosted infrastructure management without requiring a dedicated DevOps engineer.
Self-Hosted PaaS Alternative
Get PaaS-like deployment convenience on your own hardware with PodWarden — template catalog, automatic ingress, DDNS, and backups without cloud vendor lock-in.