GPU Study Group - Session 1: Detailed Host Plan

Week 1: GPU Architecture & CUDA Setup

📋 Pre-Session Preparation Checklist (Host)

Technical Setup (Do 1-2 days before)

  • Test your own CUDA environment - Run all three exercises yourself
  • Prepare backup options for attendees without CUDA:
    • Google Colab notebook with CUDA runtime
    • Online CUDA compiler link (godbolt.org)
    • Virtual machine with CUDA pre-installed
  • Screen sharing setup - Test ability to share terminal and IDE
  • Prepare demo materials - Have deviceQuery output ready to show

Content Preparation

  • Review all assigned materials - Take notes on key concepts
  • Prepare visual aids (optional):
    • GPU vs CPU architecture diagram
    • CUDA memory hierarchy diagram
    • Thread hierarchy visualization
  • Create shared document for group notes/questions

Logistics

  • Send reminder 24 hours before with:
    • Meeting link and time
    • Reminder to complete readings
    • CUDA installation instructions (if not done)
  • Prepare attendance tracking and progress monitoring

🕐 Detailed Session Agenda (90 minutes)

Opening (10 minutes)

[0:00-0:10]

Host Script:

“Welcome to our first GPU programming study group! Over the next 12 weeks, we’ll go from CUDA beginners to writing efficient GPU kernels. Today we’re building the foundation - understanding WHY GPUs work the way they do.”

Activities:

  • Quick introductions (name, background, why interested in GPU programming)
  • Outline today’s agenda
  • Set expectations for participation

Knowledge Check & Reading Review (15 minutes)

[0:10-0:25]

Host facilitation: Start with quick polls to gauge preparation level:

  1. “Show of hands - who completed all the readings?”
  2. “Who successfully installed CUDA and ran deviceQuery?”
  3. “What was the most confusing concept from the readings?”

Quick concept review (5 minutes each):

  • GPU vs CPU architecture - Have someone explain in their own words
  • CUDA terminology - Quick definitions round-robin
  • Development workflow - Walk through compile and run process

Core Discussion: Architecture Deep Dive (25 minutes)

[0:25-0:50]

Discussion Point 1: “Why do GPUs excel at parallel tasks that CPUs struggle with?” (8 minutes)

Facilitation approach:

  • Start with open responses, then guide toward key concepts
  • Look for these concepts to emerge:
    • Thousands of cores vs 4-16 cores
    • SIMT execution model
    • Memory bandwidth vs latency
    • Different design philosophies

Guiding questions if discussion stalls:

  • “What happens when a CPU encounters a cache miss?”
  • “How does GPU handle the same situation differently?”
  • “When would you NOT want to use GPU acceleration?”

Discussion Point 2: “How does the SIMT execution model impact algorithm design?” (8 minutes)

Key concepts to draw out:

  • All threads in warp execute same instruction
  • Branch divergence performance impact
  • Data parallelism vs task parallelism
  • Algorithm restructuring for GPU efficiency

Real-world examples to mention:

  • Matrix multiplication (perfect fit)
  • Sorting algorithms (requires redesign)
  • Tree traversal (challenging)

Discussion Point 3: “What development challenges do you anticipate?” (9 minutes)

Expected responses:

  • Debugging parallel code
  • Memory management complexity
  • Performance optimization difficulty
  • Different programming model

Host role: Acknowledge challenges but emphasize they’re solvable


Hands-On Exercise Session (30 minutes)

[0:50-1:20]

Exercise 1.1: Environment Verification (8 minutes)

[0:50-0:58]

Host demonstration:

  1. Show nvcc --version output on your system
  2. Demonstrate nvidia-smi and explain key information
  3. Walk through deviceQuery compilation and key output metrics

Group activity:

  • Everyone runs commands simultaneously
  • Troubleshoot issues collectively
  • Share interesting deviceQuery findings

Exercise 1.2: Hello CUDA Analysis (12 minutes)

[0:58-1:10]

Before coding - concept check (3 minutes):

  • “What does <<<2, 4>>> mean?”
  • “How many total threads will be created?”
  • “What output do we expect?”

Live coding session (6 minutes):

  • Host shares screen and codes exercise
  • Explain each line as you type
  • Compile and run, show output

Experimentation phase (3 minutes):

  • Challenge: “Try <<<3, 2>>> and <<<1, 8>>>
  • “What patterns do you notice in the output?”
  • “What happens with <<<1, 1024>>>?”

Exercise 1.3: Hardware Exploration (10 minutes)

[1:10-1:20]

Structured sharing:

  • Each person shares one interesting spec from their GPU
  • Create group comparison chart on shared doc
  • Discuss implications for programming:
    • “Why does compute capability matter?”
    • “How does memory size affect problem size?”
    • “What does max threads per block tell us?”

Wrap-up & Next Week Preview (10 minutes)

[1:20-1:30]

Key Takeaways Review (5 minutes)

Ask group to complete these statements:

  • “GPUs are good at _________ because _________”
  • “The biggest difference from CPU programming is _________”
  • “CUDA’s thread hierarchy consists of _________“

Week 2 Preview (3 minutes)

  • Topic: Memory Management & Basic Kernels
  • Key focus: Understanding GPU memory types and writing first computational kernels
  • Preparation reminder: Send materials list via email

Action Items (2 minutes)

  • Complete any exercises not finished
  • Set up development environment if needed
  • Optional: Explore CUDA samples directory

🎯 Discussion Facilitation Guide

Effective Facilitation Techniques

For Quiet Groups:

  • Direct questions: “Sarah, what did you think about the memory hierarchy?”
  • Think-pair-share: 30 seconds to think, discuss with neighbor, then share
  • Written responses: Use shared doc for anonymous input

For Dominant Speakers:

  • Redirect: “That’s a great point, John. Let’s hear other perspectives.”
  • Time limits: “Let’s get 2-3 more opinions on this”
  • Round-robin: Structure equal speaking time

When Discussion Goes Off-Track:

  • Acknowledge: “Interesting point - let’s revisit that during break”
  • Redirect: “How does this connect to our GPU architecture focus?”
  • Table it: “Let’s add that to our ‘questions for later’ list”

Expected Learning Outcomes

By end of session, participants should be able to:

  • Explain fundamental differences between GPU and CPU architecture
  • Define basic CUDA terminology (kernel, thread, block, grid)
  • Successfully compile and run a simple CUDA program
  • Interpret deviceQuery output for their GPU
  • Identify when GPU acceleration might be beneficial

🛠️ Materials & Setup Needed

Required Materials

  • Laptop/Desktop with CUDA capability OR access to Google Colab
  • Text editor/IDE (VS Code, CLion, or simple text editor)
  • CUDA Toolkit installed (version 11.0+)
  • Access to readings - confirm everyone has links

Backup Options for Technical Issues

  1. Google Colab notebook with pre-written exercises
  2. Compiler Explorer (godbolt.org) for quick CUDA compilation
  3. Shared screen coding if individual setups fail

Optional Enhancements

  • Whiteboard/Digital board for drawing architecture diagrams
  • Shared document for collaborative notes
  • Recording setup if sessions will be recorded

📧 Post-Session Follow-up

Immediate Actions (within 24 hours)

  • Send session summary with key takeaways
  • Share exercise solutions and explanations
  • Distribute Week 2 materials and reading list
  • Address any unresolved technical issues individually

Week 2 Preparation

  • Update study plan based on group’s pace and interests
  • Prepare more challenging exercises if group is advanced
  • Create reference materials for common issues encountered

Continuous Improvement

  • Collect feedback via simple survey
  • Note time management - which sections need more/less time?
  • Track progress - are learning objectives being met?

🚨 Troubleshooting Common Issues

CUDA Installation Problems

  • Windows: Direct to CUDA installer, check Visual Studio compatibility
  • Linux: Package manager installation, driver conflicts
  • macOS: No native CUDA - recommend Google Colab or Docker

Exercise Compilation Errors

  • Path issues: Help locate nvcc
  • Library linking: Show basic compilation flags
  • Code errors: Common typos and fixes

Conceptual Confusion

  • Memory hierarchy: Use diagrams and analogies
  • Thread indexing: Work through examples step-by-step
  • Execution model: Compare to familiar parallel concepts

Session Success Criteria: Everyone leaves with working CUDA setup and clear understanding of GPU architecture fundamentals