GPU Study Group - Session 1 Backup Plans & Extensions

🚀 If Session is Moving Too Fast (Extra Activities)

Advanced Discussion Topics (15-20 minutes each)

1. GPU Programming Models Comparison

Question: “How does CUDA compare to other GPU programming approaches?”

Talking points:

OpenCL: Cross-platform but more verbose
DirectCompute: Microsoft ecosystem
HIP: AMD’s answer to CUDA
Triton: High-level kernel development (preview of later sessions)
Framework abstractions: PyTorch, TensorFlow CUDA kernels

2. Real-World Performance Analysis

Activity: Analyze actual GPU specs and predict performance

Exercise: Given these applications, which GPU specs matter most?

Machine learning training (memory bandwidth)
Cryptocurrency mining (compute units)
Video rendering (memory size)
Scientific simulation (double precision)

3. Memory Coalescing Introduction

Concept: Why memory access patterns matter on GPU

Simple demonstration:

// Bad: Non-coalesced access
__global__ void badAccess(float* data) {
    int idx = threadIdx.x;
    data[idx * 32] = idx;  // Scattered access
}
 
// Good: Coalesced access
__global__ void goodAccess(float* data) {
    int idx = threadIdx.x;
    data[idx] = idx;  // Sequential access
}

Extended Hands-On Exercises

Exercise 1.4: Thread Indexing Mastery (15 minutes)

__global__ void threadInfo() {
    int blockId = blockIdx.x + blockIdx.y * gridDim.x;
    int threadId = blockId * (blockDim.x * blockDim.y) + 
                   (threadIdx.y * blockDim.x) + threadIdx.x;
    
    printf("Global thread ID: %d (Block: %d, Thread: [%d,%d])\n",
           threadId, blockId, threadIdx.x, threadIdx.y);
}
 
// Try: <<<dim3(2,2), dim3(3,2)>>>

Exercise 1.5: Simple Vector Addition Setup (20 minutes)

#include <cuda_runtime.h>
#include <stdio.h>
 
__global__ void vectorAdd(float* a, float* b, float* c, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) {
        c[idx] = a[idx] + b[idx];
    }
}
 
int main() {
    // We'll implement this next week!
    printf("Vector addition kernel ready for next session\n");
    return 0;
}

⏰ If Session is Moving Too Slow (Streamlined Approach)

Quick Concept Review (10 minutes total)

Rapid-Fire Q&A Format

“GPU has _____ cores, CPU has _____ cores” (thousands, 4-16)
“CUDA kernel runs on _____” (device/GPU)
“Host code runs on _____” (CPU)
“<<<2,4>>> creates _____ total threads” (8)
“cudaDeviceSynchronize() does _____” (waits for GPU)

Minimal Exercise Approach

Combined Exercise: Hello + Device Info (15 minutes)

#include <cuda_runtime.h>
#include <stdio.h>
 
__global__ void deviceHello() {
    printf("Hello from thread %d\n", 
           blockIdx.x * blockDim.x + threadIdx.x);
}
 
int main() {
    // Quick device info
    int deviceCount;
    cudaGetDeviceCount(&deviceCount);
    printf("Found %d CUDA device(s)\n", deviceCount);
    
    if (deviceCount > 0) {
        cudaDeviceProp prop;
        cudaGetDeviceProperties(&prop, 0);
        printf("Using: %s\n", prop.name);
        
        // Simple kernel
        deviceHello<<<2, 3>>>();
        cudaDeviceSynchronize();
    }
    
    return 0;
}

Focus on Core Concepts Only

Skip detailed memory hierarchy discussion
Emphasize: GPU = many cores, CUDA = programming model
Basic kernel syntax: __global__, <<<>>>, cudaDeviceSynchronize()

🔧 Technical Troubleshooting Scenarios

Scenario 1: Multiple People Can’t Install CUDA

Immediate Solutions (5 minutes)

Switch to Google Colab: Share pre-made notebook
Demo mode: Host screen shares all exercises
Conceptual focus: Skip hands-on, double down on discussion

Colab Quick Setup

1. Go to colab.research.google.com
2. New notebook → Runtime → Change runtime type → GPU
3. Test: !nvidia-smi
4. Write CUDA files with %%writefile magic

Scenario 2: Wide Skill Level Differences

Pairing Strategy

Buddy system: Pair experienced with beginners
Breakout approach: Advanced group does extensions
Helper roles: Experienced participants become teaching assistants

Modified Exercise Distribution

Beginners: Focus on Exercise 1.2 (Hello CUDA)
Intermediate: Add Exercise 1.4 (Thread indexing)
Advanced: Start on vector addition preparation

Scenario 3: Conceptual Confusion

Visual Learning Aids

Draw GPU architecture on shared whiteboard
Physical analogies:
- CPU = Formula 1 car (fast, few)
- GPU = Bus fleet (slower individually, many)
Step-by-step breakdown of kernel execution

Simplified Explanations

Kernel: “Function that runs many times in parallel”
Thread: “One copy of your function running”
Block: “Group of threads that can share data”
Grid: “All the blocks working on your problem”

📊 Session Assessment & Adaptation

Real-Time Feedback Indicators

Positive Signs

Questions show understanding (“What if we used more blocks?“)
Participants help each other with technical issues
Discussion naturally extends concepts
Successful exercise completion

Warning Signs

Silence during discussion periods
Same person always answering questions
Repeated requests to explain basic concepts
Technical issues overwhelming content

Adaptation Strategies

For Engaged Advanced Group

Introduce optimization concepts early
Discuss real-world applications
Preview next week’s memory management
Encourage experimentation with exercise variations

For Struggling Beginners

More analogies and visual aids
Slower pace through concepts
Focus on “why” before “how”
Ensure everyone completes basic exercises

For Mixed Group

Use advanced participants as helpers
Multiple difficulty levels for exercises
Optional extension activities
Clear prerequisite communication for next session

📝 Post-Session Adaptation Notes

Questions to Reflect On

Timing: Which sections took longer/shorter than expected?
Engagement: Where did participants seem most/least engaged?
Technical: What technical issues should we prepare for next time?
Content: What concepts need reinforcement in Session 2?

Adjustments for Week 2

If behind: Review fundamental concepts at start of next session
If ahead: Add more challenging memory management exercises
If technical issues: Prepare better backup solutions
If engagement low: More interactive elements, smaller groups

Communication Strategy

Success stories: Highlight what went well
Technical follow-up: Individual help for installation issues
Expectation setting: Adjust Week 2 difficulty based on Session 1
Resource sharing: Additional materials for different skill levels

Alex Xi's Notes

Explorer

session backup plans