Sitemap - 2024 - Thonk From First Principles
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention [external]
Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]
Solutions: What Shapes Do Matrix Multiplications Like?
What Shapes Do Matrix Multiplications Like? [medium]
Supporting Mixtral in gpt-fast through torch.compile [short]