Video Transcoding at Scale
Build a GPU-accelerated transcoding pipeline on Jetson Orin NX clusters or traditional GPU servers — all supporting infrastructure included
Video transcoding is compute-heavy and highly parallel. Every video is independent, so more workers means more throughput. The challenge is managing the fleet: deploying workers consistently, routing input and output through shared storage, and scaling up or down as demand changes.
The right hardware for a transcoding farm isn't always a rack of A100s. For most workloads — HLS ladder generation, clip transcoding, live stream packaging — the NVIDIA Jetson Orin NX is the ideal building block: dedicated NVENC and NVDEC engines, low power draw, ARM64, and a price point that makes large clusters economically practical. PodWarden runs on ARM64 and manages Jetson nodes exactly like any other host.
The Hub catalog covers every supporting service the pipeline needs. If you already have S3, a job queue, or a monitoring stack, bring it. If you don't, everything is available.
What You Need
| Component | Bring your own | Or deploy from Hub |
|---|---|---|
| Object storage | AWS S3, GCS, Wasabi, MinIO already running | MinIO or RustFS — deploy to any node, exposes S3 API |
| Job queue | Redis, RabbitMQ, SQS, NATS | Redis or RabbitMQ — from the Hub catalog |
| Database | Existing PostgreSQL (for job tracking UI) | PostgreSQL — from Hub |
| Secrets | Vault, AWS Secrets Manager | Vault — from Hub |
| Monitoring | Existing Prometheus + Grafana | Prometheus + Grafana + DCGM Exporter |
| Container registry | Docker Hub, GHCR | Harbor or Gitea — needed if you maintain custom ARM64 worker images |
Stack Architecture
The Jetson Orin NX Advantage
The Jetson Orin NX is a system-on-module with dedicated hardware video engines that make it exceptional for transcoding:
| Feature | Orin NX 16GB | Why it matters |
|---|---|---|
| NVENC | 1× dedicated encode engine | H.264, H.265, AV1 hardware encode — zero CPU load |
| NVDEC | 1× dedicated decode engine | Hardware decode of input streams |
| CUDA cores | 1024 | Available for filters, scaling, color conversion |
| Power draw | 10–25W | Dense clusters without specialized power infrastructure |
| Architecture | ARM64 | Standard Linux, standard NVIDIA container runtime |
| Form factor | SoM (67.6 × 45 mm) | Compact carrier boards, rack-mount sleds |
The NVENC and NVDEC engines are independent of the CUDA cores — they run simultaneously without contention. A single Orin NX sustains multiple concurrent 1080p60 encode sessions while the CPU handles queue polling, format demuxing, and S3 upload.
Because Orin NX modules are inexpensive, you can build clusters that would be cost-prohibitive with traditional GPU servers. A rack of Jetson nodes — each running two or three concurrent encode sessions — often outperforms a handful of A100 machines for this workload, at a fraction of the cost and power budget.
Jetson vs x86 Topology
Building the Foundation
Deploy supporting services before the transcoding workers. They're standard stacks — import from Hub, assign to your cluster, deploy.
Transcoding Pipeline Flow
1. Object Storage
Workers need to read source video and write encoded output. Register an existing S3 endpoint as a storage connection under Settings → Storage. PodWarden tests connectivity from all cluster nodes and injects credentials as environment variables at deploy time.
If you don't have S3 storage:
- MinIO — Import from Hub. Deploy to a node with fast disk (NVMe recommended for video I/O). Exposes a full S3 API; every other component treats it identically to AWS S3.
- RustFS — High-performance alternative, also S3-compatible. Better suited for high-throughput encode pipelines with many simultaneous workers.
Create two buckets: one for ingest (source files), one for output (encoded files).
2. Job Queue
Workers poll the queue for jobs, transcode, and report completion. Pick based on what you know:
- Redis — Simple, fast, widely supported. Import from Hub. Workers use
BLPOPor a queue library. - RabbitMQ — More durable, supports dead-letter queues for failed jobs. Better for high-volume pipelines where job loss is not acceptable.
If you use a cloud queue (SQS, Cloud Tasks), set the QUEUE_URL environment variable accordingly — no Hub component needed.
3. Monitoring
Import Prometheus, Grafana, and DCGM Exporter from Hub.
Deploy DCGM Exporter as a DaemonSet — it runs on every GPU node automatically and exposes per-GPU metrics including NVENC/NVDEC engine utilization (on supported drivers). On Jetson, the equivalent is Tegrastats Exporter — also available from the Hub catalog.
Grafana shows queue depth, encode throughput (frames/second per node), GPU/encoder utilization, and error rates. This tells you immediately when a node is stalled, when the queue is backing up, or when a bad source file is causing repeated failures.
4. Secrets
Store S3 credentials, queue passwords, and registry credentials in Vault (from Hub) or your existing secrets manager. PodWarden injects secrets via secret_refs at deploy time — they never appear in template definitions or deployment logs.
5. Container Registry (optional but recommended)
Jetson workers require ARM64 images built on top of NVIDIA's L4T base (nvcr.io/nvidia/l4t-base). You'll likely maintain a custom FFmpeg image for your pipeline.
Deploy Harbor or Gitea (with built-in registry) from Hub. Build your ARM64 FFmpeg image once and push it there. All Jetson workers pull from your internal registry — no external registry dependency, no rate limits.
Worker Templates
Jetson Orin NX worker (ARM64)
Kind: Deployment
Image: registry.internal/ffmpeg-worker:latest-arm64
GPU count: 1
VRAM: 8Gi
CPU: 4
Memory: 8Gi
Node selector: { "nvidia.com/gpu.product": "Orin" }| Variable | Example | Description |
|---|---|---|
QUEUE_URL | redis://redis.mesh:6379 | Job queue connection |
QUEUE_NAME | transcode-jobs | Queue name |
INPUT_BUCKET | s3://media-ingest | Source video bucket |
OUTPUT_BUCKET | s3://media-output | Encoded output bucket |
PRESET | hls-ladder | Encoding profile |
CONCURRENCY | 3 | Parallel encode sessions per node |
HWACCEL | cuda | Hardware acceleration |
NVENC_PRESET | p4 | NVENC quality/speed (p1–p7) |
S3_ENDPOINT_URL | http://minio.mesh:9000 | Internal MinIO endpoint |
Sensitive values (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, queue password) come from Vault via secret_refs.
x86 GPU server worker (amd64)
For heavier jobs — multi-stream 4K, HDR tonemapping, complex filter graphs, ProRes output — use a traditional GPU server:
Kind: Deployment
Image: registry.internal/ffmpeg-worker:latest-amd64
GPU count: 1
VRAM: 16Gi
CPU: 16
Memory: 32Gi
Node selector: { "kubernetes.io/arch": "amd64" }Same environment variables, different image architecture and node selector. Both worker types coexist in the same cluster. The queue routes job types to the appropriate workers.
Volume mounts
| Path | Volume type | Purpose |
|---|---|---|
/tmp/transcode | emptyDir | Working directory for in-flight segments |
Source video and output are handled via S3 API calls from within the worker — no persistent mounts needed unless you're using NFS for source files.
Multi-Profile Deployments
Maintain separate stacks for each encoding profile:
| Profile | Target | Notes |
|---|---|---|
| HLS adaptive bitrate | Web playback | 1080p/720p/480p/360p ladder, fMP4 segments |
| Broadcast archive | Long-term storage | ProRes 422 HQ or DNxHR — CPU-encoded on x86 nodes |
| Social clips | Short-form platforms | H.264/AV1, vertical and square crops |
| Proxy generation | Editorial workflows | Low-res H.264, fast encode for NLE preview |
Each profile is a separate stack with a different PRESET value. Deploy all profiles to the same cluster. The queue routes job types to the matching worker — Jetson nodes handle the volume, x86 GPU servers handle the exceptions.
Multi-Profile Routing
Scaling the Fleet
Jetson nodes are inexpensive enough that horizontal scaling is usually the right answer. Add nodes, join the cluster, workers start picking up jobs automatically. No queue reconfiguration, no storage changes.
For temporary capacity spikes, rent cloud GPU nodes, join them to the cluster with an x86 worker template, and remove them when the backlog clears.
Scaling Lifecycle
Job kind for batch processing — For a one-time migration or catalogue re-encode, use kind: Job instead of a deployment. The job processes the queue and stops when complete. PodWarden records run duration and exit code. Batch jobs don't idle after finishing — important when running on rented nodes.
Networking
Jetson nodes behind NAT (home labs, edge deployments) connect via Tailscale mesh — no public IP needed. PodWarden auto-detects Tailscale-connected nodes and tags them mesh. The MinIO, Redis, and Vault instances on your mesh are reachable from all worker nodes.
For latency-sensitive live transcoding, co-locate Jetson nodes with your ingest infrastructure on the same LAN. Tag those nodes lan and set the worker template to require lan connectivity — PodWarden schedules only on nodes that can reach the ingest source.
Hub Templates for This Stack
| Template | Role |
|---|---|
| FFmpeg worker (Jetson NVENC) | ARM64 NVENC/NVDEC worker |
| FFmpeg worker (x86 NVENC) | amd64 GPU-accelerated worker |
| FFmpeg worker (CPU) | Software encoding, any architecture |
| MinIO | S3-compatible object storage |
| RustFS | High-performance S3 object storage |
| Redis | Job queue |
| RabbitMQ | Durable job queue with dead-letter support |
| PostgreSQL | Job tracking database |
| Vault | Secrets management |
| Prometheus | Metrics collection |
| Grafana | Transcoding pipeline dashboards |
| DCGM Exporter | Per-GPU metrics for x86 nodes (DaemonSet) |
| Tegrastats Exporter | Per-GPU metrics for Jetson nodes (DaemonSet) |
| Harbor | Private container registry for custom ARM64 images |
The complete pipeline — storage, queue, workers, monitoring — runs on your own nodes, managed from one dashboard. No external dependencies unless you choose them.
Blender Render Farms
Manage homogeneous and heterogeneous render fleets for 3D animation and VFX production — all supporting infrastructure included
User-Generated Content Streaming
Build a geo-distributed live streaming platform — WebRTC ingest, GPU transcoding, multi-region HLS and WebRTC egress — entirely on your own infrastructure