Illustrated vLLM - interactive walk-throughs of how vLLM serves LLMs

The vLLM tour →

Seven core ideas that make vLLM fast - one diagram per concept, each one interactive.

paged attention & the block table
continuous (iteration-level) batching
prefix caching across requests
speculative decoding: draft & verify
chunked prefill
tensor parallelism & the all-reduce
disaggregated prefill / decode

PIC / Legolink →

A long, test-by-test walk-through of position-invariant chunks, span_starts, and the Legolink gap policy that re-prefills KV in place. Six interactive visualizations, one per pinning test.

PIC chunk hash invariance
span boundaries cutting the hash chain
fan-in across requests
gap intervals & in-place KV overwrite
LL-FULL ≡ full-recompute equivalence

/pic

The vLLM tour →

PIC / Legolink →

Why this site