The vLLM tour →
Seven core ideas that make vLLM fast - one diagram per concept, each one interactive.
- paged attention & the block table
- continuous (iteration-level) batching
- prefix caching across requests
- speculative decoding: draft & verify
- chunked prefill
- tensor parallelism & the all-reduce
- disaggregated prefill / decode
PIC / Legolink →
A long, test-by-test walk-through of position-invariant chunks,
span_starts, and the Legolink gap policy that re-prefills
KV in place. Six interactive visualizations, one per pinning test.
- PIC chunk hash invariance
- span boundaries cutting the hash chain
- fan-in across requests
- gap intervals & in-place KV overwrite
- LL-FULL ≡ full-recompute equivalence