GPU Study Group - Session 1 Backup Plans & Extensions
🚀 If Session is Moving Too Fast (Extra Activities)
Advanced Discussion Topics (15-20 minutes each)
1. GPU Programming Models Comparison
Question: “How does CUDA compare to other GPU programming approaches?”
Talking points:
- OpenCL: Cross-platform but more verbose
- DirectCompute: Microsoft ecosystem
- HIP: AMD’s answer to CUDA
- Triton: High-level kernel development (preview of later sessions)
- Framework abstractions: PyTorch, TensorFlow CUDA kernels
2. Real-World Performance Analysis
Activity: Analyze actual GPU specs and predict performance
Exercise: Given these applications, which GPU specs matter most?
- Machine learning training (memory bandwidth)
- Cryptocurrency mining (compute units)
- Video rendering (memory size)
- Scientific simulation (double precision)
3. Memory Coalescing Introduction
Concept: Why memory access patterns matter on GPU
Simple demonstration:
// Bad: Non-coalesced access
__global__ void badAccess(float* data) {
int idx = threadIdx.x;
data[idx * 32] = idx; // Scattered access
}
// Good: Coalesced access
__global__ void goodAccess(float* data) {
int idx = threadIdx.x;
data[idx] = idx; // Sequential access
}
Extended Hands-On Exercises
Exercise 1.4: Thread Indexing Mastery (15 minutes)
__global__ void threadInfo() {
int blockId = blockIdx.x + blockIdx.y * gridDim.x;
int threadId = blockId * (blockDim.x * blockDim.y) +
(threadIdx.y * blockDim.x) + threadIdx.x;
printf("Global thread ID: %d (Block: %d, Thread: [%d,%d])\n",
threadId, blockId, threadIdx.x, threadIdx.y);
}
// Try: <<<dim3(2,2), dim3(3,2)>>>
Exercise 1.5: Simple Vector Addition Setup (20 minutes)
#include <cuda_runtime.h>
#include <stdio.h>
__global__ void vectorAdd(float* a, float* b, float* c, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
c[idx] = a[idx] + b[idx];
}
}
int main() {
// We'll implement this next week!
printf("Vector addition kernel ready for next session\n");
return 0;
}
⏰ If Session is Moving Too Slow (Streamlined Approach)
Quick Concept Review (10 minutes total)
Rapid-Fire Q&A Format
- “GPU has _____ cores, CPU has _____ cores” (thousands, 4-16)
- “CUDA kernel runs on _____” (device/GPU)
- “Host code runs on _____” (CPU)
- “<<<2,4>>> creates _____ total threads” (8)
- “cudaDeviceSynchronize() does _____” (waits for GPU)
Minimal Exercise Approach
Combined Exercise: Hello + Device Info (15 minutes)
#include <cuda_runtime.h>
#include <stdio.h>
__global__ void deviceHello() {
printf("Hello from thread %d\n",
blockIdx.x * blockDim.x + threadIdx.x);
}
int main() {
// Quick device info
int deviceCount;
cudaGetDeviceCount(&deviceCount);
printf("Found %d CUDA device(s)\n", deviceCount);
if (deviceCount > 0) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
printf("Using: %s\n", prop.name);
// Simple kernel
deviceHello<<<2, 3>>>();
cudaDeviceSynchronize();
}
return 0;
}
Focus on Core Concepts Only
- Skip detailed memory hierarchy discussion
- Emphasize: GPU = many cores, CUDA = programming model
- Basic kernel syntax:
__global__
,<<<>>>
,cudaDeviceSynchronize()
đź”§ Technical Troubleshooting Scenarios
Scenario 1: Multiple People Can’t Install CUDA
Immediate Solutions (5 minutes)
- Switch to Google Colab: Share pre-made notebook
- Demo mode: Host screen shares all exercises
- Conceptual focus: Skip hands-on, double down on discussion
Colab Quick Setup
1. Go to colab.research.google.com
2. New notebook → Runtime → Change runtime type → GPU
3. Test: !nvidia-smi
4. Write CUDA files with %%writefile magic
Scenario 2: Wide Skill Level Differences
Pairing Strategy
- Buddy system: Pair experienced with beginners
- Breakout approach: Advanced group does extensions
- Helper roles: Experienced participants become teaching assistants
Modified Exercise Distribution
- Beginners: Focus on Exercise 1.2 (Hello CUDA)
- Intermediate: Add Exercise 1.4 (Thread indexing)
- Advanced: Start on vector addition preparation
Scenario 3: Conceptual Confusion
Visual Learning Aids
- Draw GPU architecture on shared whiteboard
- Physical analogies:
- CPU = Formula 1 car (fast, few)
- GPU = Bus fleet (slower individually, many)
- Step-by-step breakdown of kernel execution
Simplified Explanations
- Kernel: “Function that runs many times in parallel”
- Thread: “One copy of your function running”
- Block: “Group of threads that can share data”
- Grid: “All the blocks working on your problem”
📊 Session Assessment & Adaptation
Real-Time Feedback Indicators
Positive Signs
- Questions show understanding (“What if we used more blocks?“)
- Participants help each other with technical issues
- Discussion naturally extends concepts
- Successful exercise completion
Warning Signs
- Silence during discussion periods
- Same person always answering questions
- Repeated requests to explain basic concepts
- Technical issues overwhelming content
Adaptation Strategies
For Engaged Advanced Group
- Introduce optimization concepts early
- Discuss real-world applications
- Preview next week’s memory management
- Encourage experimentation with exercise variations
For Struggling Beginners
- More analogies and visual aids
- Slower pace through concepts
- Focus on “why” before “how”
- Ensure everyone completes basic exercises
For Mixed Group
- Use advanced participants as helpers
- Multiple difficulty levels for exercises
- Optional extension activities
- Clear prerequisite communication for next session
📝 Post-Session Adaptation Notes
Questions to Reflect On
- Timing: Which sections took longer/shorter than expected?
- Engagement: Where did participants seem most/least engaged?
- Technical: What technical issues should we prepare for next time?
- Content: What concepts need reinforcement in Session 2?
Adjustments for Week 2
- If behind: Review fundamental concepts at start of next session
- If ahead: Add more challenging memory management exercises
- If technical issues: Prepare better backup solutions
- If engagement low: More interactive elements, smaller groups
Communication Strategy
- Success stories: Highlight what went well
- Technical follow-up: Individual help for installation issues
- Expectation setting: Adjust Week 2 difficulty based on Session 1
- Resource sharing: Additional materials for different skill levels