wozeparrot
|
f11f63007d
|
llama: immediate scaling on flag (#16494)
|
2026-06-04 10:30:00 -07:00 |
|
wozeparrot
|
7dcfd144b6
|
llama: columnwise fp8 scaling (#16480)
|
2026-06-02 18:55:45 -07:00 |
|
wozeparrot
|
6787de9f52
|
llama: fix mp (#16434)
|
2026-05-29 11:21:43 -07:00 |
|
wozeparrot
|
f86966af56
|
llama: optim amax margin (#16425)
|
2026-05-28 20:18:11 -07:00 |
|
wozeparrot
|
36c8ff70c1
|
llama: use old scale for dequant in optim (#16417)
|
2026-05-28 15:21:19 -07:00 |
|
wozeparrot
|
dac3743d75
|
llama: delayed scaling in optim (#16407)
|
2026-05-27 15:40:03 -07:00 |
|
wozeparrot
|
68d2102fd2
|
llama: offload master weights (#16355)
|
2026-05-25 08:48:13 -07:00 |
|
chenyu
|
dcee90aa3f
|
remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
|
2026-05-16 18:40:26 -04:00 |
|
wozeparrot
|
ab6218bc92
|
llama mp fixes (#16050)
|
2026-05-05 15:35:32 -07:00 |
|
wozeparrot
|
9e60e4a7e7
|
llama: native fp8 (#15733)
|
2026-04-16 22:16:05 -07:00 |
|
wozeparrot
|
1ca178f379
|
llama: stochastic rounding (#15456)
|
2026-03-25 18:16:31 -07:00 |
|
wozeparrot
|
da2031266a
|
llama: correct 8b init (#15397)
|
2026-03-24 13:41:41 -07:00 |
|
wozeparrot
|
87c4ec1724
|
llama: use flat llama (#15353)
|
2026-03-19 22:12:38 -07:00 |
|
wozeparrot
|
749162bd2f
|
llama memory tweaks (#15223)
|
2026-03-12 12:36:23 -07:00 |
|
wozeparrot
|
4544da1c54
|
llama3 fixes part3 (#15152)
|
2026-03-05 01:17:54 -08:00 |
|
wozeparrot
|
824ba4386a
|
llama3 dp fix (#15098)
|
2026-03-02 22:43:07 -08:00 |
|
wozeparrot
|
a4f6365929
|
llama3: fstep takes grads (#15069)
|
2026-03-01 20:05:07 -08:00 |
|
wozeparrot
|
a36a26d4ed
|
llama3: optim does grad acc in correct order (#14965)
|
2026-02-23 22:25:13 -08:00 |
|
wozeparrot
|
3cda781876
|
llama optim offload (#14901)
|
2026-02-21 08:53:45 -08:00 |
|
wozeparrot
|
95e97ec341
|
seperate llama optim (#14810)
|
2026-02-17 13:02:35 -08:00 |
|