Machine:
ssh [email protected] -p 31138
CUDA Programming Course – High-Performance Computing with GPUs Programming Massively Parallel Processors GPU Puzzles
GPU Kernel Programming
Let’s write some fast numeric code. Concerns of compute, memory, cache, and data movement come together in the pursuit of performance.
(Links below don’t include computer graphics—that’s a huge topic.)
Getting Started
Extending ML Frameworks
- PyTorch: Custom Ops with CUDA
- JAX: Custom Ops with CUDA
- Triton
- Pallas (JAX)
- ggml-cuda
- tfjs-backend-webgl
- tfjs-backend-webgpu
Modern Toolchains
ML Model Formats
(Add relevant links here if needed)