Commit Graph

38 Commits

Author SHA1 Message Date
wozeparrot
c23652e486 llama: minimize peak init mem (#16440) 2026-05-29 18:00:37 -07:00
wozeparrot
36c8ff70c1 llama: use old scale for dequant in optim (#16417) 2026-05-28 15:21:19 -07:00
George Hotz
edca5df25a flip offset and shape in pad and shrink (#16414)
* flip offset and shape in pad and shrink

* dumb test
2026-05-28 11:58:19 -07:00
George Hotz
8ee3a37524 shrink/pad use (new_shape, offset) (#16405)
* shrink uses offset and shape

* pad does too

* fix
2026-05-27 15:13:08 -07:00
chenyu
31424cda71 Tensor.requires_grad -> is_param (#16325)
for optimizer
2026-05-21 19:39:57 -04:00
wozeparrot
fb718a5e9d llama: realize amax (#16308) 2026-05-21 14:00:48 -07:00
wozeparrot
afc5bfa183 llama: remove fused grad accum (#16301) 2026-05-21 09:38:40 -07:00
wozeparrot
825f30bf18 llama: apply_grad saves memory (#16275) 2026-05-20 13:14:06 -07:00
wozeparrot
361553c0a8 llama: match flat_llama with model_train (#16269) 2026-05-19 17:25:56 -07:00
wozeparrot
a3d59faef6 llama: don't save weight (#16252) 2026-05-18 17:05:45 -07:00
wozeparrot
159694347e llama: fix running flat_llama (#16224) 2026-05-15 20:16:48 -07:00
chenyu
409bb0c9ad requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
wozeparrot
b4d267dfd4 llama: only save when small (#16208) 2026-05-14 17:46:29 -07:00
wozeparrot
88ac2ac1fd llama: cleanups (#16189) 2026-05-13 17:08:06 -07:00
wozeparrot
e9359d9e7d more llama mp fixes (#16151)
* llama: SPLIT_W13

* llama: fix with no fused kernels

* llama: cast to bf16 on non asm_gemm patH

* llama: new mp flags
2026-05-11 21:29:23 -07:00
wozeparrot
730fa66bf3 llama speed 6 (#16071) 2026-05-06 20:51:03 -07:00
wozeparrot
528d35e306 llama speed 4 (#15993) 2026-04-30 17:14:41 -07:00
wozeparrot
ef09071073 llama: speed 2 (#15960) 2026-04-28 20:44:37 -07:00
wozeparrot
5e861cd2c4 llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00
wozeparrot
d3cbd781d9 llama: use fused norm mul quantize for w13 (#15878) 2026-04-22 21:27:41 -07:00
wozeparrot
87378331e8 llama: fused mul quantize fp8 (#15863) 2026-04-21 20:58:37 -07:00
wozeparrot
f28ea84de2 llama: fused silu fp8 amax (#15798)
* llama: combined w13

* llama: fused swiglu+fp8

* llama: fix amax interleaving

* llama: don't need seperate matmul
2026-04-19 12:03:55 +08:00
wozeparrot
06343092c8 llama: combined w13 (#15803) 2026-04-17 22:27:31 -07:00
wozeparrot
9e60e4a7e7 llama: native fp8 (#15733) 2026-04-16 22:16:05 -07:00
wozeparrot
480ad264a4 llama: per device amax (#15735) 2026-04-14 19:01:17 -07:00
wozeparrot
457508d5a0 llama: save more 2 (#15681) 2026-04-11 01:03:36 -07:00
wozeparrot
590464c8d8 llama: only support wqkv path + cleanups (#15680)
* llama: only support wqkv path + cleanups

* llama: missing transpose
2026-04-11 07:39:27 +08:00
wozeparrot
55bcd7cc9e llama amax outside (#15670) 2026-04-09 23:08:03 -07:00
qazal
39a029ec55 remove ASM_GEMM context var (#15645) 2026-04-08 18:02:40 +09:00
wozeparrot
70dbd35023 llama: move custom_kernel into flat_llama (#15643) 2026-04-08 00:19:14 -07:00
wozeparrot
7e54992bf6 fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
2026-04-04 18:24:57 -07:00
wozeparrot
a65e958be9 llama: new apply_grad (#15503) 2026-03-26 19:39:25 -07:00
wozeparrot
da2031266a llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
wozeparrot
87c4ec1724 llama: use flat llama (#15353) 2026-03-19 22:12:38 -07:00
George Hotz
4091d37e8e flat llama step work (#15355)
* flat llama step work

* fp8 support

* blacklisted matmul

* chestertons fence
2026-03-20 09:06:12 +08:00
George Hotz
5524916e39 llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343)
* llama compute gradients explicitly

* apply grads

* fix multi issue

* multi BUFFER_VIEW support

* simpler

* skip the flaky test
2026-03-18 19:54:40 +08:00
George Hotz
6e196195d8 add test for flat llama (#15327)
* add test for flat llama

* simpler

* back to split w1/w3

* env

* still too much ram

* invalid
2026-03-18 15:16:33 +08:00
George Hotz
2605840ee2 flat llama (#15324)
* FlatTransformer

* works

* pass in buffer views

* print stuff

* print

* bugfixes
2026-03-17 19:39:55 +08:00