Hello!
Thanks for stopping by. I'm an AI developer focused on building reliable, low-latency systems—think real-time speech-to-text, face recognition, vector search, and scalable LLM services. I care about practical engineering: measurable performance, clear architectures, and code that holds up in production.
What you'll find here
- Hands-on guides for deploying and operating AI workloads (Ray/vLLM, Triton, FastAPI, Next.js, etc.).
- Fine-tuning & training: LoRA/QLoRA/PEFT, SFT/DPO/ORPO, dataset curation & cleaning, mixed precision, FSDP/DeepSpeed, and experiment tracking (W&B/MLflow).
- Computer vision: detection/segmentation (YOLO/RT-DETR), OCR, face recognition with embeddings and re-identification, plus production-grade augmentation pipelines.
- LLMs & RAG/agents: vector stores (FAISS/OpenSearch/Supabase), retrieval pipelines, evaluation (e.g., RAGAS), prompt engineering, and lightweight agent patterns.
- Benchmarks and notes on latency, throughput, memory, and cost.
- System design write-ups: data pipelines, streaming, observability, and scaling patterns.
- Code snippets & checklists you can drop into your own projects.
My approach
- Build first, optimize second. Ship something that works, then profile and iterate.
- Measure everything. If we can't measure it, we can't improve it.
- Keep it simple. Prefer boring, proven tools over flashy complexity.
Let's connect
I'd love to hear what you're building and the challenges you're facing.
- Leave a comment below to start a thread with the community.
- Or reach me directly at mail@201lab.top.
If any post helps you, consider sharing it—your feedback and questions often shape what I write next. Welcome aboard, and enjoy the read!