GPU Study Group - Session 1 Quick Reference

🔑 Key Concepts Cheat Sheet

GPU vs CPU Architecture

GPUCPU
Thousands of simple cores4-16 complex cores
High throughputLow latency
SIMT executionSIMD/scalar execution
Optimized for parallel tasksOptimized for sequential tasks
High memory bandwidthLarge cache hierarchy

CUDA Terminology

TermDefinitionExample
HostCPU and its memoryWhere main() runs
DeviceGPU and its memoryWhere kernels run
KernelFunction that runs on GPU__global__ void myKernel()
ThreadSingle execution unitOne GPU core executing
BlockGroup of threadsUp to 1024 threads
GridCollection of blocksAll blocks for one kernel
Warp32 threads executing togetherHardware scheduling unit

Thread Hierarchy

Grid
├── Block (0,0)
│   ├── Thread (0,0)
│   ├── Thread (0,1)
│   └── Thread (0,2)
├── Block (0,1)
│   ├── Thread (0,0)
│   └── Thread (0,1)

Memory Hierarchy (Fast → Slow)

  1. Registers - Per-thread, fastest
  2. Shared Memory - Per-block, fast
  3. Global Memory - All threads, slow but large
  4. Host Memory - CPU RAM, slowest access from GPU

Common CUDA Function Patterns

// Kernel declaration
__global__ void kernelName(parameters) { }
 
// Kernel launch
kernelName<<<gridSize, blockSize>>>(arguments);
 
// Synchronization
cudaDeviceSynchronize();
 
// Error checking
cudaError_t err = cudaGetLastError();

🗣️ Discussion Facilitation Quick Guides

Opening Questions

  • “What’s your experience with parallel programming?”
  • “What GPU applications have you used as an end user?”
  • “What questions came up while reading the materials?”

When Someone Says “I Don’t Understand…”

  1. Ask for specifics: “Which part exactly?”
  2. Use analogies: “Think of it like…”
  3. Draw it out: Visual representation
  4. Get group help: “Who can explain this differently?”

Redirecting Off-Topic Discussion

  • “That’s fascinating - how does it connect to GPU architecture?”
  • “Let’s bookmark that for our advanced topics session”
  • “Great question for our follow-up discussion”

⚡ Technical Quick Fixes

CUDA Not Found

# Check installation
which nvcc
echo $PATH
export PATH=/usr/local/cuda/bin:$PATH

Compilation Issues

# Basic compilation
nvcc hello_cuda.cu -o hello_cuda
 
# With debug info
nvcc -g -G hello_cuda.cu -o hello_cuda
 
# Specify architecture
nvcc -arch=sm_50 hello_cuda.cu -o hello_cuda

No GPU Available

  • Backup: Google Colab with GPU runtime
  • Alternative: Show host screen for live demo
  • Workaround: CPU-only version to show concept

📊 Progress Tracking

Session Success Indicators

  • Everyone can run nvcc --version successfully
  • At least 80% complete Hello CUDA exercise
  • Group can explain GPU vs CPU difference
  • Participants ask relevant follow-up questions
  • Technical issues resolved or documented for follow-up

Red Flags

  • Multiple people struggling with basic installation
  • Confusion about fundamental concepts (host/device)
  • No questions being asked
  • Time running significantly over/under

Adaptation Strategies

  • Too fast: Add more discussion time, deeper exercises
  • Too slow: Skip advanced questions, focus on core concepts
  • Technical issues: Pair programming, shared screen demos
  • Mixed levels: Advanced participants help beginners