Thonk From First Principles

Thonk From First Principles

Home
Archive
About

Sitemap - 2024 - Thonk From First Principles

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention [external]

Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]

Solutions: What Shapes Do Matrix Multiplications Like?

What Shapes Do Matrix Multiplications Like? [medium]

Supporting Mixtral in gpt-fast through torch.compile [short]

© 2025 Horace He
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share