GPU Study Group - Session 1 Quick Reference

🔑 Key Concepts Cheat Sheet

GPU vs CPU Architecture

GPU	CPU
Thousands of simple cores	4-16 complex cores
High throughput	Low latency
SIMT execution	SIMD/scalar execution
Optimized for parallel tasks	Optimized for sequential tasks
High memory bandwidth	Large cache hierarchy

CUDA Terminology

Term	Definition	Example
Host	CPU and its memory	Where main() runs
Device	GPU and its memory	Where kernels run
Kernel	Function that runs on GPU	`__global__ void myKernel()`
Thread	Single execution unit	One GPU core executing
Block	Group of threads	Up to 1024 threads
Grid	Collection of blocks	All blocks for one kernel
Warp	32 threads executing together	Hardware scheduling unit

Thread Hierarchy

Grid
├── Block (0,0)
│   ├── Thread (0,0)
│   ├── Thread (0,1)
│   └── Thread (0,2)
├── Block (0,1)
│   ├── Thread (0,0)
│   └── Thread (0,1)

Memory Hierarchy (Fast → Slow)

Registers - Per-thread, fastest
Shared Memory - Per-block, fast
Global Memory - All threads, slow but large
Host Memory - CPU RAM, slowest access from GPU

Common CUDA Function Patterns

// Kernel declaration
__global__ void kernelName(parameters) { }
 
// Kernel launch
kernelName<<<gridSize, blockSize>>>(arguments);
 
// Synchronization
cudaDeviceSynchronize();
 
// Error checking
cudaError_t err = cudaGetLastError();

🗣️ Discussion Facilitation Quick Guides

Opening Questions

“What’s your experience with parallel programming?”
“What GPU applications have you used as an end user?”
“What questions came up while reading the materials?”

When Someone Says “I Don’t Understand…”

Ask for specifics: “Which part exactly?”
Use analogies: “Think of it like…”
Draw it out: Visual representation
Get group help: “Who can explain this differently?”

Redirecting Off-Topic Discussion

“That’s fascinating - how does it connect to GPU architecture?”
“Let’s bookmark that for our advanced topics session”
“Great question for our follow-up discussion”

⚡ Technical Quick Fixes

CUDA Not Found

# Check installation
which nvcc
echo $PATH
export PATH=/usr/local/cuda/bin:$PATH

Compilation Issues

# Basic compilation
nvcc hello_cuda.cu -o hello_cuda
 
# With debug info
nvcc -g -G hello_cuda.cu -o hello_cuda
 
# Specify architecture
nvcc -arch=sm_50 hello_cuda.cu -o hello_cuda

No GPU Available

Backup: Google Colab with GPU runtime
Alternative: Show host screen for live demo
Workaround: CPU-only version to show concept

📊 Progress Tracking

Session Success Indicators

Everyone can run nvcc --version successfully
At least 80% complete Hello CUDA exercise
Group can explain GPU vs CPU difference
Participants ask relevant follow-up questions
Technical issues resolved or documented for follow-up

Red Flags

Multiple people struggling with basic installation
Confusion about fundamental concepts (host/device)
No questions being asked
Time running significantly over/under

Adaptation Strategies

Too fast: Add more discussion time, deeper exercises
Too slow: Skip advanced questions, focus on core concepts
Technical issues: Pair programming, shared screen demos
Mixed levels: Advanced participants help beginners

Alex Xi's Notes

Explorer

quick reference card

GPU Study Group - Session 1 Quick Reference

🔑 Key Concepts Cheat Sheet

GPU vs CPU Architecture

CUDA Terminology

Thread Hierarchy

Memory Hierarchy (Fast → Slow)

Common CUDA Function Patterns

🗣️ Discussion Facilitation Quick Guides

Opening Questions

When Someone Says “I Don’t Understand…”

Redirecting Off-Topic Discussion

⚡ Technical Quick Fixes

CUDA Not Found

Compilation Issues

No GPU Available

📊 Progress Tracking

Session Success Indicators

Red Flags

Adaptation Strategies

Table of Contents

Backlinks

Explorer