wozeparrot
|
c23652e486
|
llama: minimize peak init mem (#16440)
|
2026-05-29 18:00:37 -07:00 |
|
wozeparrot
|
36c8ff70c1
|
llama: use old scale for dequant in optim (#16417)
|
2026-05-28 15:21:19 -07:00 |
|
George Hotz
|
edca5df25a
|
flip offset and shape in pad and shrink (#16414)
* flip offset and shape in pad and shrink
* dumb test
|
2026-05-28 11:58:19 -07:00 |
|
George Hotz
|
8ee3a37524
|
shrink/pad use (new_shape, offset) (#16405)
* shrink uses offset and shape
* pad does too
* fix
|
2026-05-27 15:13:08 -07:00 |
|
chenyu
|
31424cda71
|
Tensor.requires_grad -> is_param (#16325)
for optimizer
|
2026-05-21 19:39:57 -04:00 |
|
wozeparrot
|
fb718a5e9d
|
llama: realize amax (#16308)
|
2026-05-21 14:00:48 -07:00 |
|
wozeparrot
|
afc5bfa183
|
llama: remove fused grad accum (#16301)
|
2026-05-21 09:38:40 -07:00 |
|
wozeparrot
|
825f30bf18
|
llama: apply_grad saves memory (#16275)
|
2026-05-20 13:14:06 -07:00 |
|
wozeparrot
|
361553c0a8
|
llama: match flat_llama with model_train (#16269)
|
2026-05-19 17:25:56 -07:00 |
|
wozeparrot
|
a3d59faef6
|
llama: don't save weight (#16252)
|
2026-05-18 17:05:45 -07:00 |
|
wozeparrot
|
159694347e
|
llama: fix running flat_llama (#16224)
|
2026-05-15 20:16:48 -07:00 |
|
chenyu
|
409bb0c9ad
|
requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
|
2026-05-15 02:01:04 -04:00 |
|
wozeparrot
|
b4d267dfd4
|
llama: only save when small (#16208)
|
2026-05-14 17:46:29 -07:00 |
|
wozeparrot
|
88ac2ac1fd
|
llama: cleanups (#16189)
|
2026-05-13 17:08:06 -07:00 |
|
wozeparrot
|
e9359d9e7d
|
more llama mp fixes (#16151)
* llama: SPLIT_W13
* llama: fix with no fused kernels
* llama: cast to bf16 on non asm_gemm patH
* llama: new mp flags
|
2026-05-11 21:29:23 -07:00 |
|
wozeparrot
|
730fa66bf3
|
llama speed 6 (#16071)
|
2026-05-06 20:51:03 -07:00 |
|
wozeparrot
|
528d35e306
|
llama speed 4 (#15993)
|
2026-04-30 17:14:41 -07:00 |
|
wozeparrot
|
ef09071073
|
llama: speed 2 (#15960)
|
2026-04-28 20:44:37 -07:00 |
|
wozeparrot
|
5e861cd2c4
|
llama: move llama kernels to llama_kernels (#15952)
|
2026-04-27 22:48:53 -07:00 |
|
wozeparrot
|
d3cbd781d9
|
llama: use fused norm mul quantize for w13 (#15878)
|
2026-04-22 21:27:41 -07:00 |
|
wozeparrot
|
87378331e8
|
llama: fused mul quantize fp8 (#15863)
|
2026-04-21 20:58:37 -07:00 |
|
wozeparrot
|
f28ea84de2
|
llama: fused silu fp8 amax (#15798)
* llama: combined w13
* llama: fused swiglu+fp8
* llama: fix amax interleaving
* llama: don't need seperate matmul
|
2026-04-19 12:03:55 +08:00 |
|
wozeparrot
|
06343092c8
|
llama: combined w13 (#15803)
|
2026-04-17 22:27:31 -07:00 |
|
wozeparrot
|
9e60e4a7e7
|
llama: native fp8 (#15733)
|
2026-04-16 22:16:05 -07:00 |
|
wozeparrot
|
480ad264a4
|
llama: per device amax (#15735)
|
2026-04-14 19:01:17 -07:00 |
|
wozeparrot
|
457508d5a0
|
llama: save more 2 (#15681)
|
2026-04-11 01:03:36 -07:00 |
|
wozeparrot
|
590464c8d8
|
llama: only support wqkv path + cleanups (#15680)
* llama: only support wqkv path + cleanups
* llama: missing transpose
|
2026-04-11 07:39:27 +08:00 |
|
wozeparrot
|
55bcd7cc9e
|
llama amax outside (#15670)
|
2026-04-09 23:08:03 -07:00 |
|
qazal
|
39a029ec55
|
remove ASM_GEMM context var (#15645)
|
2026-04-08 18:02:40 +09:00 |
|
wozeparrot
|
70dbd35023
|
llama: move custom_kernel into flat_llama (#15643)
|
2026-04-08 00:19:14 -07:00 |
|
wozeparrot
|
7e54992bf6
|
fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
|
2026-04-04 18:24:57 -07:00 |
|
wozeparrot
|
a65e958be9
|
llama: new apply_grad (#15503)
|
2026-03-26 19:39:25 -07:00 |
|
wozeparrot
|
da2031266a
|
llama: correct 8b init (#15397)
|
2026-03-24 13:41:41 -07:00 |
|
wozeparrot
|
87c4ec1724
|
llama: use flat llama (#15353)
|
2026-03-19 22:12:38 -07:00 |
|
George Hotz
|
4091d37e8e
|
flat llama step work (#15355)
* flat llama step work
* fp8 support
* blacklisted matmul
* chestertons fence
|
2026-03-20 09:06:12 +08:00 |
|
George Hotz
|
5524916e39
|
llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343)
* llama compute gradients explicitly
* apply grads
* fix multi issue
* multi BUFFER_VIEW support
* simpler
* skip the flaky test
|
2026-03-18 19:54:40 +08:00 |
|
George Hotz
|
6e196195d8
|
add test for flat llama (#15327)
* add test for flat llama
* simpler
* back to split w1/w3
* env
* still too much ram
* invalid
|
2026-03-18 15:16:33 +08:00 |
|
George Hotz
|
2605840ee2
|
flat llama (#15324)
* FlatTransformer
* works
* pass in buffer views
* print stuff
* print
* bugfixes
|
2026-03-17 19:39:55 +08:00 |
|