Here’s a richer, clearer, and more engaging version of your reading group outline:


🚀 GPU Programming Reading Group

🎯 Goal

Demystify GPU programming through hands-on learning, group discussion, and concrete implementations—building up from basic concepts to real CUDA kernels that power modern AI and scientific computing.


✅ Prerequisites

To get the most out of this group, you should be comfortable with:

  • 💻 Python & C++: Core syntax and basic memory operations

  • 📐 Linear Algebra: Especially vectors and matrix multiplication (dot product, row/column transformations)


🧠 Core Concepts We’ll Explore

We’ll break down the GPU execution model and understand how it maps to real workloads:

  • Kernels – your code that runs on the GPU

  • Threads – the smallest unit of execution

  • Blocks – groups of threads

  • Grids – groups of blocks

  • Warps – a group of 32 threads scheduled together; the foundation of performance tuning

We’ll also look at memory types (global, shared, local) and data transfer patterns between host (CPU) and device (GPU).


📚 Learning Materials

CUDA Course (Chapters 1–5)


🛠️ Hands-On Projects

During the reading group, we’ll read, build, and debug CUDA code together. Planned exercises:

  1. Hello CUDA World

    • Write and launch your first CUDA kernel

    • Understand thread hierarchy via simple debug output

  2. Vector Addition

    • Launch many threads to compute in parallel

    • Compare GPU vs CPU performance

  3. Matrix Multiplication

    • Implement dense matmul using global memory

    • Optimize it step-by-step using tiling and shared memory

    • Optional: explore warp-level primitives for further speed-up


✨ Outcomes

By the end of this group, you’ll:

  • Understand the mental model of GPU programming

  • Be comfortable writing simple CUDA kernels from scratch

  • Know how to optimize memory access and thread usage

  • Be ready to explore advanced libraries like Triton, Cutlass, or write custom ops for ML frameworks


Let me know if you’d like a Notion/Markdown export of this for easy sharing!