GPU Study Group - Session 1: Detailed Host Plan
Week 1: GPU Architecture & CUDA Setup
📋 Pre-Session Preparation Checklist (Host)
Technical Setup (Do 1-2 days before)
- Test your own CUDA environment - Run all three exercises yourself
- Prepare backup options for attendees without CUDA:
- Google Colab notebook with CUDA runtime
- Online CUDA compiler link (godbolt.org)
- Virtual machine with CUDA pre-installed
- Screen sharing setup - Test ability to share terminal and IDE
- Prepare demo materials - Have deviceQuery output ready to show
Content Preparation
- Review all assigned materials - Take notes on key concepts
- Prepare visual aids (optional):
- GPU vs CPU architecture diagram
- CUDA memory hierarchy diagram
- Thread hierarchy visualization
- Create shared document for group notes/questions
Logistics
- Send reminder 24 hours before with:
- Meeting link and time
- Reminder to complete readings
- CUDA installation instructions (if not done)
- Prepare attendance tracking and progress monitoring
🕐 Detailed Session Agenda (90 minutes)
Opening (10 minutes)
[0:00-0:10]
Host Script:
“Welcome to our first GPU programming study group! Over the next 12 weeks, we’ll go from CUDA beginners to writing efficient GPU kernels. Today we’re building the foundation - understanding WHY GPUs work the way they do.”
Activities:
- Quick introductions (name, background, why interested in GPU programming)
- Outline today’s agenda
- Set expectations for participation
Knowledge Check & Reading Review (15 minutes)
[0:10-0:25]
Host facilitation: Start with quick polls to gauge preparation level:
- “Show of hands - who completed all the readings?”
- “Who successfully installed CUDA and ran deviceQuery?”
- “What was the most confusing concept from the readings?”
Quick concept review (5 minutes each):
- GPU vs CPU architecture - Have someone explain in their own words
- CUDA terminology - Quick definitions round-robin
- Development workflow - Walk through compile and run process
Core Discussion: Architecture Deep Dive (25 minutes)
[0:25-0:50]
Discussion Point 1: “Why do GPUs excel at parallel tasks that CPUs struggle with?” (8 minutes)
Facilitation approach:
- Start with open responses, then guide toward key concepts
- Look for these concepts to emerge:
- Thousands of cores vs 4-16 cores
- SIMT execution model
- Memory bandwidth vs latency
- Different design philosophies
Guiding questions if discussion stalls:
- “What happens when a CPU encounters a cache miss?”
- “How does GPU handle the same situation differently?”
- “When would you NOT want to use GPU acceleration?”
Discussion Point 2: “How does the SIMT execution model impact algorithm design?” (8 minutes)
Key concepts to draw out:
- All threads in warp execute same instruction
- Branch divergence performance impact
- Data parallelism vs task parallelism
- Algorithm restructuring for GPU efficiency
Real-world examples to mention:
- Matrix multiplication (perfect fit)
- Sorting algorithms (requires redesign)
- Tree traversal (challenging)
Discussion Point 3: “What development challenges do you anticipate?” (9 minutes)
Expected responses:
- Debugging parallel code
- Memory management complexity
- Performance optimization difficulty
- Different programming model
Host role: Acknowledge challenges but emphasize they’re solvable
Hands-On Exercise Session (30 minutes)
[0:50-1:20]
Exercise 1.1: Environment Verification (8 minutes)
[0:50-0:58]
Host demonstration:
- Show
nvcc --version
output on your system - Demonstrate
nvidia-smi
and explain key information - Walk through deviceQuery compilation and key output metrics
Group activity:
- Everyone runs commands simultaneously
- Troubleshoot issues collectively
- Share interesting deviceQuery findings
Exercise 1.2: Hello CUDA Analysis (12 minutes)
[0:58-1:10]
Before coding - concept check (3 minutes):
- “What does
<<<2, 4>>>
mean?” - “How many total threads will be created?”
- “What output do we expect?”
Live coding session (6 minutes):
- Host shares screen and codes exercise
- Explain each line as you type
- Compile and run, show output
Experimentation phase (3 minutes):
- Challenge: “Try
<<<3, 2>>>
and<<<1, 8>>>
” - “What patterns do you notice in the output?”
- “What happens with
<<<1, 1024>>>
?”
Exercise 1.3: Hardware Exploration (10 minutes)
[1:10-1:20]
Structured sharing:
- Each person shares one interesting spec from their GPU
- Create group comparison chart on shared doc
- Discuss implications for programming:
- “Why does compute capability matter?”
- “How does memory size affect problem size?”
- “What does max threads per block tell us?”
Wrap-up & Next Week Preview (10 minutes)
[1:20-1:30]
Key Takeaways Review (5 minutes)
Ask group to complete these statements:
- “GPUs are good at _________ because _________”
- “The biggest difference from CPU programming is _________”
- “CUDA’s thread hierarchy consists of _________“
Week 2 Preview (3 minutes)
- Topic: Memory Management & Basic Kernels
- Key focus: Understanding GPU memory types and writing first computational kernels
- Preparation reminder: Send materials list via email
Action Items (2 minutes)
- Complete any exercises not finished
- Set up development environment if needed
- Optional: Explore CUDA samples directory
🎯 Discussion Facilitation Guide
Effective Facilitation Techniques
For Quiet Groups:
- Direct questions: “Sarah, what did you think about the memory hierarchy?”
- Think-pair-share: 30 seconds to think, discuss with neighbor, then share
- Written responses: Use shared doc for anonymous input
For Dominant Speakers:
- Redirect: “That’s a great point, John. Let’s hear other perspectives.”
- Time limits: “Let’s get 2-3 more opinions on this”
- Round-robin: Structure equal speaking time
When Discussion Goes Off-Track:
- Acknowledge: “Interesting point - let’s revisit that during break”
- Redirect: “How does this connect to our GPU architecture focus?”
- Table it: “Let’s add that to our ‘questions for later’ list”
Expected Learning Outcomes
By end of session, participants should be able to:
- Explain fundamental differences between GPU and CPU architecture
- Define basic CUDA terminology (kernel, thread, block, grid)
- Successfully compile and run a simple CUDA program
- Interpret deviceQuery output for their GPU
- Identify when GPU acceleration might be beneficial
🛠️ Materials & Setup Needed
Required Materials
- Laptop/Desktop with CUDA capability OR access to Google Colab
- Text editor/IDE (VS Code, CLion, or simple text editor)
- CUDA Toolkit installed (version 11.0+)
- Access to readings - confirm everyone has links
Backup Options for Technical Issues
- Google Colab notebook with pre-written exercises
- Compiler Explorer (godbolt.org) for quick CUDA compilation
- Shared screen coding if individual setups fail
Optional Enhancements
- Whiteboard/Digital board for drawing architecture diagrams
- Shared document for collaborative notes
- Recording setup if sessions will be recorded
📧 Post-Session Follow-up
Immediate Actions (within 24 hours)
- Send session summary with key takeaways
- Share exercise solutions and explanations
- Distribute Week 2 materials and reading list
- Address any unresolved technical issues individually
Week 2 Preparation
- Update study plan based on group’s pace and interests
- Prepare more challenging exercises if group is advanced
- Create reference materials for common issues encountered
Continuous Improvement
- Collect feedback via simple survey
- Note time management - which sections need more/less time?
- Track progress - are learning objectives being met?
🚨 Troubleshooting Common Issues
CUDA Installation Problems
- Windows: Direct to CUDA installer, check Visual Studio compatibility
- Linux: Package manager installation, driver conflicts
- macOS: No native CUDA - recommend Google Colab or Docker
Exercise Compilation Errors
- Path issues: Help locate nvcc
- Library linking: Show basic compilation flags
- Code errors: Common typos and fixes
Conceptual Confusion
- Memory hierarchy: Use diagrams and analogies
- Thread indexing: Work through examples step-by-step
- Execution model: Compare to familiar parallel concepts
Session Success Criteria: Everyone leaves with working CUDA setup and clear understanding of GPU architecture fundamentals