qazal
|
bfb2d1f89a
|
Revert "fp8 gemm speedup (#16236)" (#16245)
This reverts commit d95bf394e1.
|
2026-05-19 02:01:44 +09:00 |
|
qazal
|
d95bf394e1
|
fp8 gemm speedup (#16236)
* add asm_gemm option
* milestone
* work
* edit
* only the fast kernel
* diff
|
2026-05-17 04:58:28 +09:00 |
|
wozeparrot
|
e97f2c1114
|
llama: only gemm + fa custom kernel (#16180)
* llama: tie store to grad directly
* llama: set mp flags
* llama: non fused grad fp8 quantize path
|
2026-05-12 21:03:49 -07:00 |
|
wozeparrot
|
730fa66bf3
|
llama speed 6 (#16071)
|
2026-05-06 20:51:03 -07:00 |
|
wozeparrot
|
528d35e306
|
llama speed 4 (#15993)
|
2026-04-30 17:14:41 -07:00 |
|
wozeparrot
|
ef09071073
|
llama: speed 2 (#15960)
|
2026-04-28 20:44:37 -07:00 |
|
chenyu
|
9192c93b7e
|
Tensor.invalid -> Tesnor.invalids (#15849)
matches ones and zeros, and to not share name with UOp.invalid
|
2026-04-21 11:19:51 -04:00 |
|
wozeparrot
|
9e60e4a7e7
|
llama: native fp8 (#15733)
|
2026-04-16 22:16:05 -07:00 |
|
wozeparrot
|
55bcd7cc9e
|
llama amax outside (#15670)
|
2026-04-09 23:08:03 -07:00 |
|
wozeparrot
|
70dbd35023
|
llama: move custom_kernel into flat_llama (#15643)
|
2026-04-08 00:19:14 -07:00 |
|
wozeparrot
|
7e54992bf6
|
fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
|
2026-04-04 18:24:57 -07:00 |
|
Christopher Milan
|
0ed8d9271d
|
Renderers accept Target or nothing (#15590)
|
2026-04-03 01:09:41 -04:00 |
|
qazal
|
fefb0ebc2a
|
gemm/asm: fp8 cleanups (#15580)
* normal gemm here
* s/dtypes.fp8e4m3/FP8_DTYPE
* gemm_bw
* device UOp stays NULL
|
2026-04-02 19:02:38 +09:00 |
|
qazal
|
8feb8edc68
|
gemm/asm: add fp8 support to cdna asm_gemm (#15542)
* work
* hmm, mixins
* rhs_transposed
* also fix the dtype
* check for hipcc
* Exception
* select dev
* default
|
2026-03-31 19:32:54 +09:00 |
|
chenyu
|
da1700e16b
|
dtypes.index -> dtypes.weakint (#15377)
|
2026-03-20 01:08:46 -04:00 |
|
George Hotz
|
4091d37e8e
|
flat llama step work (#15355)
* flat llama step work
* fp8 support
* blacklisted matmul
* chestertons fence
|
2026-03-20 09:06:12 +08:00 |
|
George Hotz
|
6e196195d8
|
add test for flat llama (#15327)
* add test for flat llama
* simpler
* back to split w1/w3
* env
* still too much ram
* invalid
|
2026-03-18 15:16:33 +08:00 |
|
qazal
|
5cd1daa3bc
|
cdna asm_gemm in one file, remove old rdna3 asm (#15281)
|
2026-03-16 04:32:30 +09:00 |
|