Posts

Published 2 posts — keep it up!


Today is day of 0, % of the year.

% of the day have passed.

2025

  • Qwen3-Reranker CLS-style Refactor Explained: From LM Head to Score Head
    08-13

    Qwen3-Reranker CLS-style Refactor Explained: From LM Head to Score Head

    A system-level deep dive into refactoring Qwen3-Reranker from an LM head (H→V) to a score head (H→1): correct decoder-only pooling, 0.6B FLOPs math, ~296 MiB VRAM savings, bandwidth effects, pitfalls, and a practical deployment checklist - plus open-source repos and a preview of Triton deployment with an OpenAI-style API.

  • Welcome to My Blog!
    08-06

    Welcome to My Blog!

    Hi! I'm an AI developer who loves building real-time systems and writing about practical machine learning, systems, and tooling. Glad you're here.

©2025 Shawn. RSS Sitemap
| Already wrote 1.2k words