qazal
1e0fffe256
fused ce llama kernel in UOps ( #16263 )
...
* work
* using uops
* delete things
* work
* work
* higher level uops
* cleanups
2026-05-20 19:45:28 +09:00
nimlgen
b3dcf8f452
hcq2: split into schedule/realize ( #16216 )
...
* hcq2: split into schedule/realize
* missing
* x
* f
* clean
* cleaner
* x
* x
* x
* x
* x
2026-05-19 16:40:17 +03:00
qazal
e4350e7de9
set hipcc mac docker to 7.1 ( #16261 )
...
* set hipcc mac docker to 7.1
* pull from amd
2026-05-19 21:30:39 +09:00
qazal
bfb2d1f89a
Revert "fp8 gemm speedup ( #16236 )" ( #16245 )
...
This reverts commit d95bf394e1 .
2026-05-19 02:01:44 +09:00
chenyu
dcee90aa3f
remove requires_grad use in extra/examples ( #16238 )
...
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
qazal
d95bf394e1
fp8 gemm speedup ( #16236 )
...
* add asm_gemm option
* milestone
* work
* edit
* only the fast kernel
* diff
2026-05-17 04:58:28 +09:00
qazal
d54fa86b71
viz/cli: select all calls in graph by default ( #16214 )
2026-05-15 21:01:44 +09:00
nimlgen
28b98e529d
nv: move structs to vram ( #16184 )
...
* nv: vram
* x
* 4090
* x
* move and sysmem on macos
* x
* remove hp
2026-05-15 13:41:42 +03:00
chenyu
409bb0c9ad
requires_grad cannot be None ( #16212 )
...
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
nimlgen
62ea73719d
hcq2: share more with graph ( #16196 )
...
* share more with graph
* comment
2026-05-14 22:28:11 +03:00
nimlgen
ad1fb7c981
hcq2: graph ( #16186 )
...
* keep this for now
* early graph
2026-05-13 22:49:43 +03:00
wozeparrot
e97f2c1114
llama: only gemm + fa custom kernel ( #16180 )
...
* llama: tie store to grad directly
* llama: set mp flags
* llama: non fused grad fp8 quantize path
2026-05-12 21:03:49 -07:00
Christopher Milan
316607f004
dsp: don't use docker in ci ( #16167 )
...
* dsp: don't use docker in ci
* add setup script for macos docker
2026-05-12 17:11:03 -04:00
nimlgen
e93fb5f9b9
hcq2: remove hcqprogram ( #16157 )
...
* hcq2 rm program
* nonbeauty
* no prog
* tiny
* f
* x
2026-05-12 18:49:13 +03:00
nimlgen
e5729935c6
time_call ( #16152 )
...
* time_call
* x
* fix caches
2026-05-12 16:58:28 +03:00
chenyu
3942a80f66
fix wrong kwargs passed into rands ( #16149 )
...
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Vikram Rangarajan
effa263865
Torch backend aten::cat.out fix ( #16121 )
...
* Handle empty 1D tensors in cat_out
* Undid other changes
* Fixed torch cat
* Improved cat.out, added more tests
* Cleaned code
* Type hinted dim
* Removed whitespace
2026-05-11 16:28:16 -07:00
qazal
fc2cc1d77a
viz: call graph renderer example ( #16141 )
...
* work
* emits
* this
* cleaner repr for custom binaries
* --call-graph
* _ref
* this
* start
* this
* everything execpt the pyrender
* bring pyrender back
2026-05-12 05:07:30 +09:00
nimlgen
70c2480e71
hcq2 to extra ( #16126 )
...
* hcq2 in extra
* correct
* some revert from non-extra
* cln
* cpu
* x
* attach
* min
* remove attach
* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c
get_buf() for Buffer ( #16134 )
...
* p
* mypy
* x
2026-05-11 16:36:14 +03:00
Christopher Milan
faabe6aa42
nv: remaining firmware from /lib/firmware ( #16088 )
2026-05-07 23:07:43 -04:00
Christopher Milan
9a6f7f7576
nv: look for fmc firmware in /lib/firmware ( #16080 )
2026-05-07 18:08:27 -04:00
nimlgen
2f0aa884d5
tinygpu: minimal is macos13 for resets ( #16075 )
2026-05-07 21:25:56 +03:00
wozeparrot
730fa66bf3
llama speed 6 ( #16071 )
2026-05-06 20:51:03 -07:00
wozeparrot
ab6218bc92
llama mp fixes ( #16050 )
2026-05-05 15:35:32 -07:00
wozeparrot
528d35e306
llama speed 4 ( #15993 )
2026-04-30 17:14:41 -07:00
wozeparrot
eddcd4723b
am_smi throttle info ( #15997 )
2026-04-30 15:28:32 -07:00
nimlgen
dfd2d07005
remove CompiledRunner ( #15970 )
...
* rm usage of CompiledRunner
* more tests
* last
* linter
* sink
* remove
* linter
2026-04-29 22:45:48 +03:00
qazal
a37b605523
remove arch from asm kernel class ( #15977 )
...
* rm arch from kernel
* update other tests
* update abstractions4.py
2026-04-30 03:39:52 +09:00
qazal
b63e0a5f74
viz/sqtt: move amd decoder to extra, don't import from ops_amd ( #15969 )
...
* don't import from ops_amd
* start
* cleanup
2026-04-30 00:49:15 +09:00
wozeparrot
ef09071073
llama: speed 2 ( #15960 )
2026-04-28 20:44:37 -07:00
Christopher Milan
e6863a1cc5
autogen: fewer type: ignores ( #15956 )
2026-04-28 21:58:13 -04:00
nimlgen
77965a22e5
local optimize as rewrite ( #15953 )
...
* local optimize as rewrite
* better
* x
* slighly rename
* fix
* ugh
* remove
* x
* remove
* not weak
2026-04-28 22:51:04 +03:00
qazal
b3f0f8d349
llama: fix missing label_smoothing arg ( #15955 )
2026-04-29 02:12:14 +09:00
wozeparrot
5e861cd2c4
llama: move llama kernels to llama_kernels ( #15952 )
2026-04-27 22:48:53 -07:00
nimlgen
4164666c72
programinfo ( #15942 )
...
* programinfo
* fix
* m
* x
* x
* changes
* x
* fix
* rm
2026-04-27 23:12:03 +03:00
qazal
8c174bdad4
viz/sqtt: correct exec pipes ( #15885 )
...
* wmma
* p2
* test
* left
* work
* pickle
* handwritten failing tests
* start work
* test the pipes
* empirical evidence
* update rdna4 enum types
* VALU pipe 1
* TRANSCENDENTAL pipe
* transcendental function units
* reorder
* wmma pipe
* cleanup and notes
* smaller
* work
* diff cleanup
* pickle
* use se:1
* int
2026-04-28 05:05:49 +09:00
nimlgen
bb652352c7
remove execitem ( #15932 )
...
* remove execitem
* f
* x
2026-04-25 19:33:04 +03:00
nimlgen
768106a542
remove schedule from extra/docs/examples ( #15929 )
...
* remove schedule from extra/docs/examples
* f
2026-04-25 14:09:12 +03:00
Denys Melnyk
1fdcb13bfb
webgpu: fix weight lookup in export_model after compile_net key change ( #15919 )
...
* fix lookup site in export_model_webgpu after refactoring
webgpu (sd): fix export_model weight lookup after compile_net changes
fix lookup site in export_model_webgpu after refactoring
* add regression test
2026-04-25 10:04:55 +03:00
wozeparrot
4b908b6e2c
llama: fused ce loss ( #15920 )
2026-04-24 20:01:24 -07:00
nimlgen
f2751955cb
remove linear_to_schedule from tests ( #15912 )
...
* remove linear_to_schedule from tests
* x
2026-04-24 20:02:10 +03:00
qazal
f379b5a40a
sqtt: match amd's TS_DELTA_SHORT offset ( #15901 )
2026-04-24 06:41:22 +03:00
wozeparrot
d3cbd781d9
llama: use fused norm mul quantize for w13 ( #15878 )
2026-04-22 21:27:41 -07:00
nimlgen
e5891acab2
jit: precompile ( #15848 )
...
* x
* jit: precompile as sep step
* x
* s
* x
* x
* x
* ?
* ?
* x
* x
* viz
* f
* x
* u
* x
* x
2026-04-23 00:23:32 +03:00
wozeparrot
87378331e8
llama: fused mul quantize fp8 ( #15863 )
2026-04-21 20:58:37 -07:00
chenyu
9192c93b7e
Tensor.invalid -> Tesnor.invalids ( #15849 )
...
matches ones and zeros, and to not share name with UOp.invalid
2026-04-21 11:19:51 -04:00
nimlgen
bfe28ee2ad
rm run_schedule ( #15847 )
2026-04-21 18:14:30 +03:00
nimlgen
ae9b84d32f
rm beam uop ( #15844 )
2026-04-21 13:10:26 +03:00
qazal
f9655af2a3
viz/cli: move to tinygrad ( #15835 )
...
* move cli
* update imports
* cleanup the readme
* edit
* work
* details
* python -m tinygrad.viz.cli
* do not execv in non tty
* option
* lint
* simpler
* gemm pmc
2026-04-21 13:35:10 +09:00