Posts
Published 2 posts — keep it up!
Today is day of 0, % of the year.
% of the day have passed.
2025
- 08-13
Qwen3-Reranker CLS-style Refactor Explained: From LM Head to Score Head
A system-level deep dive into refactoring Qwen3-Reranker from an LM head (H→V) to a score head (H→1): correct decoder-only pooling, 0.6B FLOPs math, ~296 MiB VRAM savings, bandwidth effects, pitfalls, and a practical deployment checklist - plus open-source repos and a preview of Triton deployment with an OpenAI-style API.
- 08-06
Welcome to My Blog!
Hi! I'm an AI developer who loves building real-time systems and writing about practical machine learning, systems, and tooling. Glad you're here.