tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-13 16:37:04 +08:00

Author	SHA1	Message	Date
qazal	452c7d4230	llama: don't allocate grad_xw13 in bf16 (#16359 )	2026-05-28 04:33:07 +09:00
qazal	eecd4706ff	fix mailbox comment, add types (#16360 )	2026-05-25 22:24:00 +09:00
qazal	bbfe4f80ec	quantize_fp8 kernels in uops (#16288 ) * add tests * simple UOp kernel is n^2 * fast kernel matching c++, opts_to_apply=() * remove cpp * simple o(n) kernel, two passes * fuse the loops * works on DEV=CPU * multi regression test * fix multi, this can possibly be its own bugfix * test cleanups * minimal diff * match C in UOps * Revert "match C in UOps" This reverts commit `0bef740c30`. * edit test * match speed with C try 2 * needs_second_gpu * cleanup	2026-05-22 20:54:06 +09:00
wozeparrot	afc5bfa183	llama: remove fused grad accum (#16301 )	2026-05-21 09:38:40 -07:00
qazal	1e0fffe256	fused ce llama kernel in UOps (#16263 ) * work * using uops * delete things * work * work * higher level uops * cleanups	2026-05-20 19:45:28 +09:00
wozeparrot	e97f2c1114	llama: only gemm + fa custom kernel (#16180 ) * llama: tie store to grad directly * llama: set mp flags * llama: non fused grad fp8 quantize path	2026-05-12 21:03:49 -07:00
wozeparrot	730fa66bf3	llama speed 6 (#16071 )	2026-05-06 20:51:03 -07:00
wozeparrot	ab6218bc92	llama mp fixes (#16050 )	2026-05-05 15:35:32 -07:00
wozeparrot	ef09071073	llama: speed 2 (#15960 )	2026-04-28 20:44:37 -07:00
qazal	b3f0f8d349	llama: fix missing label_smoothing arg (#15955 )	2026-04-29 02:12:14 +09:00
wozeparrot	5e861cd2c4	llama: move llama kernels to llama_kernels (#15952 )	2026-04-27 22:48:53 -07:00

11 Commits