George Hotz
d810bd2b41
Merge branch 'master' into move_gates_to_load_store
2026-05-04 17:05:21 -07:00
George Hotz
09ec34437d
fix oob validation
2026-05-04 16:55:32 -07:00
chenyu
a357a0449a
Tensor.div cleanup ( #16041 )
2026-05-04 19:27:36 -04:00
George Hotz
36383298be
move gates to load/store
2026-05-04 14:56:37 -07:00
George Hotz
8f397f5c7c
move load gates
2026-05-04 14:45:42 -07:00
nimlgen
5b4f62519d
cache buffer_views as well ( #16039 )
...
* cache buffer_views as well
* reuse
* back
* x
2026-05-05 00:00:09 +03:00
Christopher Milan
8e99c4f097
fetch checks sha256 ( #16037 )
2026-05-04 16:08:38 -04:00
George Hotz
1884f67a39
simplify full_rewrite_to_sink spec ( #16035 )
...
* simplify full_rewrite_to_sink spec
* test cleanups
2026-05-04 11:44:13 -07:00
chenyu
a4fccd23b2
remove kwargs in UOp.vectorize [pr] ( #16034 )
2026-05-04 12:46:38 -04:00
qazal
b1d88ebf02
viz/cli: aggregate flops in -t ( #16031 )
...
* 38
* plumbing
* more flops
* flop/s and bytes/s
* arithmetic mean
* tests
* harmonic mean
* range
* better
* simplify
* fix prints
* no string parsing needed
2026-05-04 17:35:02 +03:00
qazal
c02e390c2b
viz: encode flops, mem and metadata in json ( #16032 )
...
* gate print
* update everywhere to check path
* server encodes json
* ui changes
* cli changes
* tests never need regex
* no str replace
* update test_pipes
* remove that
2026-05-04 23:06:18 +09:00
bigyoshi
4024d8438f
runtime/graph: avoid core_id runtimevar merge conflicts ( #16026 )
...
Co-authored-by: bigyoshi51 <269989564+bigyoshi51@users.noreply.github.com >
2026-05-03 19:16:02 +03:00
qazal
9684334dfe
viz: fix flops in graph, add null graph tracing ( #16024 )
...
* min repro, todos
* null graph tracing
* work
* work
* work
* only test_flops
* exec points back
* first
* better
* integral timestamps maybe
* cleanup
* simpler, update NULL to use SDMA naming
* integration test
* sdma
2026-05-03 22:32:44 +09:00
wozeparrot
419d525553
feat: handle multioutput kernel grads ( #16028 )
2026-05-02 22:31:45 -07:00
mefengl
9717d3a3a2
hotfix: prepend LD_LIBRARY_PATH to DLL posix search dirs ( #16023 )
2026-05-02 20:45:19 +03:00
qazal
7daf4b7d52
viz: split cli test ( #16015 )
...
* viz: split cli test
* arg3 is msg
2026-05-03 01:47:11 +09:00
nimlgen
d65b8ca25f
jit: remove *input_list from the graph sources ( #16021 )
2026-05-02 14:42:47 +03:00
qazal
7dae9e6f7f
viz: keep VIZ.value = 0 during python shutdown, cleanup launch ( #16022 )
...
* viz: keep VIZ.value = 0 during python shutdown, cleaner execv
* rm
2026-05-02 20:35:53 +09:00
Christopher Milan
637bdd5530
am: only support CDNA3/4 and RDNA3/4 ( #16017 )
2026-05-02 00:02:14 -04:00
George Hotz
4a2e1f1076
STORE doesn't have ranges anymore ( #16019 )
...
* STORE doesn't have ranges anymore
* fix
2026-05-01 15:00:27 -07:00
chenyu
0bffbc5f8a
onnx fmod uses fmod ( #16018 )
2026-05-01 16:47:11 -04:00
chenyu
782d1ff80f
Tensor.fmod ( #16014 )
...
c-style mod matches torch
2026-05-01 16:02:18 -04:00
nimlgen
1079441332
revoke bus master ( #16007 )
2026-05-01 18:00:01 +03:00
qazal
8b147a9ed5
minimal repro for llama copies 2 ( #16011 )
2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b
Revert "cleanup: untrack wait Metal buffers ( #15954 )" ( #16010 )
...
* Revert "cleanup: untrack wait Metal buffers (#15954 )"
This reverts commit 5eb1fd5d3c .
* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7
metal synchronize regression test ( #16008 )
...
* add test for metal wait=True
* add self.assertRaises
2026-05-01 20:10:57 +09:00
nimlgen
f6d92b55e6
am: use per pipe reset for gfx11+ ( #16006 )
2026-05-01 12:56:43 +03:00
Christopher Milan
cee73becbe
am: ip offsets in autogen ( #16003 )
2026-05-01 00:13:52 -04:00
George Hotz
4506688285
split render to render.py ( #16002 )
...
* split render to render.py
* move more print
2026-04-30 19:41:14 -07:00
George Hotz
d651b4bbf0
SPEC=3 checks the shape ( #16001 )
...
* SPEC=3 checks the shape
* buffer view
* Revert "buffer view"
This reverts commit ffd87889a9 .
* buffer view hack
* fix ptx
2026-04-30 18:41:37 -07:00
wozeparrot
528d35e306
llama speed 4 ( #15993 )
2026-04-30 17:14:41 -07:00
George Hotz
45fd7a3668
lil_image vectorize ( #16000 )
...
* lil_image vectorize
* 0 pitch on height 1
* Revert "0 pitch on height 1"
This reverts commit 58a83e6622 .
2026-04-30 16:12:43 -07:00
wozeparrot
eddcd4723b
am_smi throttle info ( #15997 )
2026-04-30 15:28:32 -07:00
chenyu
52c92e15ae
no replacement multinomial ( #15995 )
...
* no replacement multinomial
Efraimidis–Spirakis
* num_samples == 1 can use fast path
2026-04-30 17:35:26 -04:00
chenyu
e0b09f288f
input validation for rand functions ( #15990 )
2026-04-30 14:00:44 -04:00
nimlgen
11e1a2b89f
cleaner and faster run_linear ( #15987 )
...
* cleaner and faster run_linear
* x
* assert for now
* x
* x
* sym_infer
* remove sink
2026-04-30 20:15:22 +03:00
qazal
58b34e71bd
failing test for llama useless copies ( #15989 )
2026-05-01 00:55:29 +09:00
George Hotz
0f7e296f5b
fix some indexing edge cases ( #15988 )
2026-04-30 08:05:30 -07:00
nimlgen
6f8b10d251
remove base Runner ( #15986 )
...
* remove base Runner
* linters
2026-04-30 13:04:55 +03:00
George Hotz
46a36a838a
small dtype shapes fixups ( #15984 )
2026-04-29 19:40:38 -07:00
chenyu
b73248958a
minor rand cleanups ( #15982 )
2026-04-29 22:22:29 -04:00
chenyu
53a28bafbd
rand device seed to its own function ( #15979 )
2026-04-29 17:21:40 -04:00
Christopher Milan
d07741f1d7
am: look for firmware in /lib/firmware/amdgpu ( #15974 )
2026-04-29 17:15:09 -04:00
nimlgen
c73e667fc0
remove if for precompiled programs ( #15980 )
2026-04-29 23:43:36 +03:00
qazal
55915584e5
viz: fix cfg for emulated amd on the null device ( #15976 )
...
* simple failing when i test it end to end
* pass
* linter
* assemble
2026-04-30 05:18:09 +09:00
nimlgen
dfd2d07005
remove CompiledRunner ( #15970 )
...
* rm usage of CompiledRunner
* more tests
* last
* linter
* sink
* remove
* linter
2026-04-29 22:45:48 +03:00
wozeparrot
0080489abe
llama: use env vars ( #15978 )
2026-04-29 12:37:15 -07:00
qazal
a37b605523
remove arch from asm kernel class ( #15977 )
...
* rm arch from kernel
* update other tests
* update abstractions4.py
2026-04-30 03:39:52 +09:00
Christopher Milan
7a79c2948a
DEV visible device filter supports hyphenated syntax ( #15971 )
2026-04-29 14:02:21 -04:00
Christopher Milan
6b9a45568c
autogen: better version handling for llvm and libclang ( #15975 )
2026-04-29 14:01:33 -04:00