Alex Xi's Notes

      • Debugging Issues from Postgres JSONB
      • Framework for Technical Decisions
        • Notes On Finetuning
      • My experience with GPU vendors
      • On scaling an engineering stack (WIP)
      • Startup Engineering Growth Stages
      • To the interviewees
      • Active Thinking and Processes (WIP)
      • Capturing Value From Technology Diffusion (WIP)
      • On Serverless Providers (WIP)
      • On startup velocity (WIP)
      • Bardock - a Monorepo Dockerfile management tool
      • Building a secure TCP Forwarder
      • Custom Neovim Plugins
      • Datacenter
      • Embedded Replicated Key Value Store (WIP)
      • My Dev setup
      • Simple Remote FS
      • vcat
    Home

    ❯

    Engineering

    ❯

    ML

    ❯

    Notes On Finetuning

    Notes On Finetuning

    May 20, 20251 min read

    Papers

    DPO: https://arxiv.org/pdf/2305.18290 DPO Failure Mode: https://arxiv.org/pdf/2402.13228 SIMPO: https://arxiv.org/pdf/2405.14734 CPO: https://arxiv.org/pdf/2401.08417 KTO: https://arxiv.org/pdf/2402.01306 RPO: https://arxiv.org/pdf/2402.10958 PPO: https://www.adaptive-ml.com/post/from-zero-to-ppo SPIN: https://arxiv.org/pdf/2401.01335 Online vs. Offline Alignment https://arxiv.org/abs/2405.08448 DPO vs. PPO https://arxiv.org/pdf/2404.10719 https://arxiv.org/pdf/2406.09279

    Discussions https://x.com/Teknium1/status/1869136010053140926 https://x.com/Teknium1/status/1818012735210405920 https://x.com/kalomaze/status/1834402347755143168 https://x.com/kalomaze/status/1876302592202195035 https://x.com/EsotericCofe/status/1876266464468189252 https://www.blog.chai-research.com/post/chai-gpt-rlhf-part-i-reward-modelling

    Frameworks https://github.com/axolotl-ai-cloud/axolotl


    Backlinks

    • No backlinks found
        • Debugging Issues from Postgres JSONB
        • Framework for Technical Decisions
          • Notes On Finetuning
        • My experience with GPU vendors
        • On scaling an engineering stack (WIP)
        • Startup Engineering Growth Stages
        • To the interviewees
        • Active Thinking and Processes (WIP)
        • Capturing Value From Technology Diffusion (WIP)
        • On Serverless Providers (WIP)
        • On startup velocity (WIP)
        • Bardock - a Monorepo Dockerfile management tool
        • Building a secure TCP Forwarder
        • Custom Neovim Plugins
        • Datacenter
        • Embedded Replicated Key Value Store (WIP)
        • My Dev setup
        • Simple Remote FS
        • vcat