April 24, 2026
Attention Is All You Need
Notes
Introduced the Transformer architecture. The paper that started everything.
Browse posts by tag
Introduced the Transformer architecture. The paper that started everything.
IO-aware attention that is both faster and uses less memory. Essential infrastructure.
The evolution of neural sequence prediction, and how it connects to classical methods