wozeparrot
c23652e486
llama: minimize peak init mem ( #16440 )
2026-05-29 18:00:37 -07:00
wozeparrot
36c8ff70c1
llama: use old scale for dequant in optim ( #16417 )
2026-05-28 15:21:19 -07:00
George Hotz
edca5df25a
flip offset and shape in pad and shrink ( #16414 )
...
* flip offset and shape in pad and shrink
* dumb test
2026-05-28 11:58:19 -07:00
George Hotz
8ee3a37524
shrink/pad use (new_shape, offset) ( #16405 )
...
* shrink uses offset and shape
* pad does too
* fix
2026-05-27 15:13:08 -07:00
chenyu
31424cda71
Tensor.requires_grad -> is_param ( #16325 )
...
for optimizer
2026-05-21 19:39:57 -04:00
wozeparrot
fb718a5e9d
llama: realize amax ( #16308 )
2026-05-21 14:00:48 -07:00
wozeparrot
afc5bfa183
llama: remove fused grad accum ( #16301 )
2026-05-21 09:38:40 -07:00
Christopher Milan
172f9493e1
move is_dtype_supported to renderer ( #16226 )
2026-05-20 21:19:37 -04:00
wozeparrot
825f30bf18
llama: apply_grad saves memory ( #16275 )
2026-05-20 13:14:06 -07:00
wozeparrot
361553c0a8
llama: match flat_llama with model_train ( #16269 )
2026-05-19 17:25:56 -07:00
wozeparrot
a3d59faef6
llama: don't save weight ( #16252 )
2026-05-18 17:05:45 -07:00
wozeparrot
159694347e
llama: fix running flat_llama ( #16224 )
2026-05-15 20:16:48 -07:00
chenyu
07a172dbbb
remove noop requires_grad_ calls ( #16213 )
2026-05-15 13:31:10 -04:00
chenyu
409bb0c9ad
requires_grad cannot be None ( #16212 )
...
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
wozeparrot
b4d267dfd4
llama: only save when small ( #16208 )
2026-05-14 17:46:29 -07:00
wozeparrot
88ac2ac1fd
llama: cleanups ( #16189 )
2026-05-13 17:08:06 -07:00
wozeparrot
e9359d9e7d
more llama mp fixes ( #16151 )
...
* llama: SPLIT_W13
* llama: fix with no fused kernels
* llama: cast to bf16 on non asm_gemm patH
* llama: new mp flags
2026-05-11 21:29:23 -07:00
wozeparrot
730fa66bf3
llama speed 6 ( #16071 )
2026-05-06 20:51:03 -07:00
wozeparrot
528d35e306
llama speed 4 ( #15993 )
2026-04-30 17:14:41 -07:00
wozeparrot
ef09071073
llama: speed 2 ( #15960 )
2026-04-28 20:44:37 -07:00
wozeparrot
5e861cd2c4
llama: move llama kernels to llama_kernels ( #15952 )
2026-04-27 22:48:53 -07:00
wozeparrot
d3cbd781d9
llama: use fused norm mul quantize for w13 ( #15878 )
2026-04-22 21:27:41 -07:00
wozeparrot
87378331e8
llama: fused mul quantize fp8 ( #15863 )
2026-04-21 20:58:37 -07:00
wozeparrot
f28ea84de2
llama: fused silu fp8 amax ( #15798 )
...
* llama: combined w13
* llama: fused swiglu+fp8
* llama: fix amax interleaving
* llama: don't need seperate matmul
2026-04-19 12:03:55 +08:00
wozeparrot
06343092c8
llama: combined w13 ( #15803 )
2026-04-17 22:27:31 -07:00
wozeparrot
9e60e4a7e7
llama: native fp8 ( #15733 )
2026-04-16 22:16:05 -07:00
wozeparrot
480ad264a4
llama: per device amax ( #15735 )
2026-04-14 19:01:17 -07:00
wozeparrot
457508d5a0
llama: save more 2 ( #15681 )
2026-04-11 01:03:36 -07:00
wozeparrot
590464c8d8
llama: only support wqkv path + cleanups ( #15680 )
...
* llama: only support wqkv path + cleanups
* llama: missing transpose
2026-04-11 07:39:27 +08:00
wozeparrot
55bcd7cc9e
llama amax outside ( #15670 )
2026-04-09 23:08:03 -07:00
qazal
39a029ec55
remove ASM_GEMM context var ( #15645 )
2026-04-08 18:02:40 +09:00
wozeparrot
70dbd35023
llama: move custom_kernel into flat_llama ( #15643 )
2026-04-08 00:19:14 -07:00
wozeparrot
7e54992bf6
fp8 llama ( #15588 )
...
Co-authored-by: qazal <qazal.software@gmail.com >
2026-04-04 18:24:57 -07:00
wozeparrot
a65e958be9
llama: new apply_grad ( #15503 )
2026-03-26 19:39:25 -07:00
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> ( #15467 )
...
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
2026-03-26 03:48:03 -04:00
wozeparrot
da2031266a
llama: correct 8b init ( #15397 )
2026-03-24 13:41:41 -07:00
wozeparrot
87c4ec1724
llama: use flat llama ( #15353 )
2026-03-19 22:12:38 -07:00
George Hotz
4091d37e8e
flat llama step work ( #15355 )
...
* flat llama step work
* fp8 support
* blacklisted matmul
* chestertons fence
2026-03-20 09:06:12 +08:00
George Hotz
5524916e39
llama compute gradients explicitly + 243 GB of RAM on MP=8 ( #15343 )
...
* llama compute gradients explicitly
* apply grads
* fix multi issue
* multi BUFFER_VIEW support
* simpler
* skip the flaky test
2026-03-18 19:54:40 +08:00
George Hotz
6e196195d8
add test for flat llama ( #15327 )
...
* add test for flat llama
* simpler
* back to split w1/w3
* env
* still too much ram
* invalid
2026-03-18 15:16:33 +08:00
George Hotz
2605840ee2
flat llama ( #15324 )
...
* FlatTransformer
* works
* pass in buffer views
* print stuff
* print
* bugfixes
2026-03-17 19:39:55 +08:00
wozeparrot
a191ac0566
llama: use mlperf model ( #15257 )
2026-03-13 08:08:32 -07:00