GPU Study Group - Session 1 Backup Plans & Extensions

🚀 If Session is Moving Too Fast (Extra Activities)

Advanced Discussion Topics (15-20 minutes each)

1. GPU Programming Models Comparison

Question: “How does CUDA compare to other GPU programming approaches?”

Talking points:

  • OpenCL: Cross-platform but more verbose
  • DirectCompute: Microsoft ecosystem
  • HIP: AMD’s answer to CUDA
  • Triton: High-level kernel development (preview of later sessions)
  • Framework abstractions: PyTorch, TensorFlow CUDA kernels

2. Real-World Performance Analysis

Activity: Analyze actual GPU specs and predict performance

Exercise: Given these applications, which GPU specs matter most?

  • Machine learning training (memory bandwidth)
  • Cryptocurrency mining (compute units)
  • Video rendering (memory size)
  • Scientific simulation (double precision)

3. Memory Coalescing Introduction

Concept: Why memory access patterns matter on GPU

Simple demonstration:

// Bad: Non-coalesced access
__global__ void badAccess(float* data) {
    int idx = threadIdx.x;
    data[idx * 32] = idx;  // Scattered access
}
 
// Good: Coalesced access
__global__ void goodAccess(float* data) {
    int idx = threadIdx.x;
    data[idx] = idx;  // Sequential access
}

Extended Hands-On Exercises

Exercise 1.4: Thread Indexing Mastery (15 minutes)

__global__ void threadInfo() {
    int blockId = blockIdx.x + blockIdx.y * gridDim.x;
    int threadId = blockId * (blockDim.x * blockDim.y) + 
                   (threadIdx.y * blockDim.x) + threadIdx.x;
    
    printf("Global thread ID: %d (Block: %d, Thread: [%d,%d])\n",
           threadId, blockId, threadIdx.x, threadIdx.y);
}
 
// Try: <<<dim3(2,2), dim3(3,2)>>>

Exercise 1.5: Simple Vector Addition Setup (20 minutes)

#include <cuda_runtime.h>
#include <stdio.h>
 
__global__ void vectorAdd(float* a, float* b, float* c, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) {
        c[idx] = a[idx] + b[idx];
    }
}
 
int main() {
    // We'll implement this next week!
    printf("Vector addition kernel ready for next session\n");
    return 0;
}

⏰ If Session is Moving Too Slow (Streamlined Approach)

Quick Concept Review (10 minutes total)

Rapid-Fire Q&A Format

  1. “GPU has _____ cores, CPU has _____ cores” (thousands, 4-16)
  2. “CUDA kernel runs on _____” (device/GPU)
  3. “Host code runs on _____” (CPU)
  4. “<<<2,4>>> creates _____ total threads” (8)
  5. “cudaDeviceSynchronize() does _____” (waits for GPU)

Minimal Exercise Approach

Combined Exercise: Hello + Device Info (15 minutes)

#include <cuda_runtime.h>
#include <stdio.h>
 
__global__ void deviceHello() {
    printf("Hello from thread %d\n", 
           blockIdx.x * blockDim.x + threadIdx.x);
}
 
int main() {
    // Quick device info
    int deviceCount;
    cudaGetDeviceCount(&deviceCount);
    printf("Found %d CUDA device(s)\n", deviceCount);
    
    if (deviceCount > 0) {
        cudaDeviceProp prop;
        cudaGetDeviceProperties(&prop, 0);
        printf("Using: %s\n", prop.name);
        
        // Simple kernel
        deviceHello<<<2, 3>>>();
        cudaDeviceSynchronize();
    }
    
    return 0;
}

Focus on Core Concepts Only

  • Skip detailed memory hierarchy discussion
  • Emphasize: GPU = many cores, CUDA = programming model
  • Basic kernel syntax: __global__, <<<>>>, cudaDeviceSynchronize()

đź”§ Technical Troubleshooting Scenarios

Scenario 1: Multiple People Can’t Install CUDA

Immediate Solutions (5 minutes)

  1. Switch to Google Colab: Share pre-made notebook
  2. Demo mode: Host screen shares all exercises
  3. Conceptual focus: Skip hands-on, double down on discussion

Colab Quick Setup

1. Go to colab.research.google.com
2. New notebook → Runtime → Change runtime type → GPU
3. Test: !nvidia-smi
4. Write CUDA files with %%writefile magic

Scenario 2: Wide Skill Level Differences

Pairing Strategy

  • Buddy system: Pair experienced with beginners
  • Breakout approach: Advanced group does extensions
  • Helper roles: Experienced participants become teaching assistants

Modified Exercise Distribution

  • Beginners: Focus on Exercise 1.2 (Hello CUDA)
  • Intermediate: Add Exercise 1.4 (Thread indexing)
  • Advanced: Start on vector addition preparation

Scenario 3: Conceptual Confusion

Visual Learning Aids

  1. Draw GPU architecture on shared whiteboard
  2. Physical analogies:
    • CPU = Formula 1 car (fast, few)
    • GPU = Bus fleet (slower individually, many)
  3. Step-by-step breakdown of kernel execution

Simplified Explanations

  • Kernel: “Function that runs many times in parallel”
  • Thread: “One copy of your function running”
  • Block: “Group of threads that can share data”
  • Grid: “All the blocks working on your problem”

📊 Session Assessment & Adaptation

Real-Time Feedback Indicators

Positive Signs

  • Questions show understanding (“What if we used more blocks?“)
  • Participants help each other with technical issues
  • Discussion naturally extends concepts
  • Successful exercise completion

Warning Signs

  • Silence during discussion periods
  • Same person always answering questions
  • Repeated requests to explain basic concepts
  • Technical issues overwhelming content

Adaptation Strategies

For Engaged Advanced Group

  • Introduce optimization concepts early
  • Discuss real-world applications
  • Preview next week’s memory management
  • Encourage experimentation with exercise variations

For Struggling Beginners

  • More analogies and visual aids
  • Slower pace through concepts
  • Focus on “why” before “how”
  • Ensure everyone completes basic exercises

For Mixed Group

  • Use advanced participants as helpers
  • Multiple difficulty levels for exercises
  • Optional extension activities
  • Clear prerequisite communication for next session

📝 Post-Session Adaptation Notes

Questions to Reflect On

  1. Timing: Which sections took longer/shorter than expected?
  2. Engagement: Where did participants seem most/least engaged?
  3. Technical: What technical issues should we prepare for next time?
  4. Content: What concepts need reinforcement in Session 2?

Adjustments for Week 2

  • If behind: Review fundamental concepts at start of next session
  • If ahead: Add more challenging memory management exercises
  • If technical issues: Prepare better backup solutions
  • If engagement low: More interactive elements, smaller groups

Communication Strategy

  • Success stories: Highlight what went well
  • Technical follow-up: Individual help for installation issues
  • Expectation setting: Adjust Week 2 difficulty based on Session 1
  • Resource sharing: Additional materials for different skill levels