Thonk From First Principles
Subscribe
Sign in
Home
Archive
About
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention [external]
Freeing users from the software lottery tyranny of fused attention implementations.
Aug 7
•
Horace He
11
Share this post
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention [external]
www.thonking.ai
Copy link
Facebook
Email
Note
Other
1
April 2024
Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]
Great minds discuss flops per watt.
Apr 29
•
Horace He
83
Share this post
Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]
www.thonking.ai
Copy link
Facebook
Email
Note
Other
16
Solutions: What Shapes Do Matrix Multiplications Like?
Companion to https://www.thonking.ai/p/what-shapes-do-matrix-multiplications
Apr 8
•
Horace He
7
Share this post
Solutions: What Shapes Do Matrix Multiplications Like?
www.thonking.ai
Copy link
Facebook
Email
Note
Other
What Shapes Do Matrix Multiplications Like? [medium]
Divining order from the chaos
Apr 1
•
Horace He
38
Share this post
What Shapes Do Matrix Multiplications Like? [medium]
www.thonking.ai
Copy link
Facebook
Email
Note
Other
February 2024
Supporting Mixtral in gpt-fast through torch.compile [short]
Long-form version of this tweet thread: https://twitter.com/cHHillee/status/1762269069351461196
Feb 26
•
Horace He
and
Yanbo Liang
10
Share this post
Supporting Mixtral in gpt-fast through torch.compile [short]
www.thonking.ai
Copy link
Facebook
Email
Note
Other
4
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts