chenyu
|
dcee90aa3f
|
remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
|
2026-05-16 18:40:26 -04:00 |
|
wozeparrot
|
ab6218bc92
|
llama mp fixes (#16050)
|
2026-05-05 15:35:32 -07:00 |
|
wozeparrot
|
9e60e4a7e7
|
llama: native fp8 (#15733)
|
2026-04-16 22:16:05 -07:00 |
|
wozeparrot
|
1ca178f379
|
llama: stochastic rounding (#15456)
|
2026-03-25 18:16:31 -07:00 |
|
wozeparrot
|
da2031266a
|
llama: correct 8b init (#15397)
|
2026-03-24 13:41:41 -07:00 |
|
wozeparrot
|
87c4ec1724
|
llama: use flat llama (#15353)
|
2026-03-19 22:12:38 -07:00 |
|
wozeparrot
|
749162bd2f
|
llama memory tweaks (#15223)
|
2026-03-12 12:36:23 -07:00 |
|
wozeparrot
|
4544da1c54
|
llama3 fixes part3 (#15152)
|
2026-03-05 01:17:54 -08:00 |
|
wozeparrot
|
824ba4386a
|
llama3 dp fix (#15098)
|
2026-03-02 22:43:07 -08:00 |
|
wozeparrot
|
a4f6365929
|
llama3: fstep takes grads (#15069)
|
2026-03-01 20:05:07 -08:00 |
|
wozeparrot
|
a36a26d4ed
|
llama3: optim does grad acc in correct order (#14965)
|
2026-02-23 22:25:13 -08:00 |
|
wozeparrot
|
3cda781876
|
llama optim offload (#14901)
|
2026-02-21 08:53:45 -08:00 |
|
wozeparrot
|
95e97ec341
|
seperate llama optim (#14810)
|
2026-02-17 13:02:35 -08:00 |
|