Commit Graph

13183 Commits

Author SHA1 Message Date
George Hotz
d810bd2b41 Merge branch 'master' into move_gates_to_load_store 2026-05-04 17:05:21 -07:00
George Hotz
09ec34437d fix oob validation 2026-05-04 16:55:32 -07:00
chenyu
a357a0449a Tensor.div cleanup (#16041) 2026-05-04 19:27:36 -04:00
George Hotz
36383298be move gates to load/store 2026-05-04 14:56:37 -07:00
George Hotz
8f397f5c7c move load gates 2026-05-04 14:45:42 -07:00
nimlgen
5b4f62519d cache buffer_views as well (#16039)
* cache buffer_views as well

* reuse

* back

* x
2026-05-05 00:00:09 +03:00
Christopher Milan
8e99c4f097 fetch checks sha256 (#16037) 2026-05-04 16:08:38 -04:00
George Hotz
1884f67a39 simplify full_rewrite_to_sink spec (#16035)
* simplify full_rewrite_to_sink spec

* test cleanups
2026-05-04 11:44:13 -07:00
chenyu
a4fccd23b2 remove kwargs in UOp.vectorize [pr] (#16034) 2026-05-04 12:46:38 -04:00
qazal
b1d88ebf02 viz/cli: aggregate flops in -t (#16031)
* 38

* plumbing

* more flops

* flop/s and bytes/s

* arithmetic mean

* tests

* harmonic mean

* range

* better

* simplify

* fix prints

* no string parsing needed
2026-05-04 17:35:02 +03:00
qazal
c02e390c2b viz: encode flops, mem and metadata in json (#16032)
* gate print

* update everywhere to check path

* server encodes json

* ui changes

* cli changes

* tests never need regex

* no str replace

* update test_pipes

* remove that
2026-05-04 23:06:18 +09:00
bigyoshi
4024d8438f runtime/graph: avoid core_id runtimevar merge conflicts (#16026)
Co-authored-by: bigyoshi51 <269989564+bigyoshi51@users.noreply.github.com>
2026-05-03 19:16:02 +03:00
qazal
9684334dfe viz: fix flops in graph, add null graph tracing (#16024)
* min repro, todos

* null graph tracing

* work

* work

* work

* only test_flops

* exec points back

* first

* better

* integral timestamps maybe

* cleanup

* simpler, update NULL to use SDMA naming

* integration test

* sdma
2026-05-03 22:32:44 +09:00
wozeparrot
419d525553 feat: handle multioutput kernel grads (#16028) 2026-05-02 22:31:45 -07:00
mefengl
9717d3a3a2 hotfix: prepend LD_LIBRARY_PATH to DLL posix search dirs (#16023) 2026-05-02 20:45:19 +03:00
qazal
7daf4b7d52 viz: split cli test (#16015)
* viz: split cli test

* arg3 is msg
2026-05-03 01:47:11 +09:00
nimlgen
d65b8ca25f jit: remove *input_list from the graph sources (#16021) 2026-05-02 14:42:47 +03:00
qazal
7dae9e6f7f viz: keep VIZ.value = 0 during python shutdown, cleanup launch (#16022)
* viz: keep VIZ.value = 0 during python shutdown, cleaner execv

* rm
2026-05-02 20:35:53 +09:00
Christopher Milan
637bdd5530 am: only support CDNA3/4 and RDNA3/4 (#16017) 2026-05-02 00:02:14 -04:00
George Hotz
4a2e1f1076 STORE doesn't have ranges anymore (#16019)
* STORE doesn't have ranges anymore

* fix
2026-05-01 15:00:27 -07:00
chenyu
0bffbc5f8a onnx fmod uses fmod (#16018) 2026-05-01 16:47:11 -04:00
chenyu
782d1ff80f Tensor.fmod (#16014)
c-style mod matches torch
2026-05-01 16:02:18 -04:00
nimlgen
1079441332 revoke bus master (#16007) 2026-05-01 18:00:01 +03:00
qazal
8b147a9ed5 minimal repro for llama copies 2 (#16011) 2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b Revert "cleanup: untrack wait Metal buffers (#15954)" (#16010)
* Revert "cleanup: untrack wait Metal buffers (#15954)"

This reverts commit 5eb1fd5d3c.

* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7 metal synchronize regression test (#16008)
* add test for metal wait=True

* add self.assertRaises
2026-05-01 20:10:57 +09:00
nimlgen
f6d92b55e6 am: use per pipe reset for gfx11+ (#16006) 2026-05-01 12:56:43 +03:00
Christopher Milan
cee73becbe am: ip offsets in autogen (#16003) 2026-05-01 00:13:52 -04:00
George Hotz
4506688285 split render to render.py (#16002)
* split render to render.py

* move more print
2026-04-30 19:41:14 -07:00
George Hotz
d651b4bbf0 SPEC=3 checks the shape (#16001)
* SPEC=3 checks the shape

* buffer view

* Revert "buffer view"

This reverts commit ffd87889a9.

* buffer view hack

* fix ptx
2026-04-30 18:41:37 -07:00
wozeparrot
528d35e306 llama speed 4 (#15993) 2026-04-30 17:14:41 -07:00
George Hotz
45fd7a3668 lil_image vectorize (#16000)
* lil_image vectorize

* 0 pitch on height 1

* Revert "0 pitch on height 1"

This reverts commit 58a83e6622.
2026-04-30 16:12:43 -07:00
wozeparrot
eddcd4723b am_smi throttle info (#15997) 2026-04-30 15:28:32 -07:00
chenyu
52c92e15ae no replacement multinomial (#15995)
* no replacement multinomial

Efraimidis–Spirakis

* num_samples == 1 can use fast path
2026-04-30 17:35:26 -04:00
chenyu
e0b09f288f input validation for rand functions (#15990) 2026-04-30 14:00:44 -04:00
nimlgen
11e1a2b89f cleaner and faster run_linear (#15987)
* cleaner and faster run_linear

* x

* assert for now

* x

* x

* sym_infer

* remove sink
2026-04-30 20:15:22 +03:00
qazal
58b34e71bd failing test for llama useless copies (#15989) 2026-05-01 00:55:29 +09:00
George Hotz
0f7e296f5b fix some indexing edge cases (#15988) 2026-04-30 08:05:30 -07:00
nimlgen
6f8b10d251 remove base Runner (#15986)
* remove base Runner

* linters
2026-04-30 13:04:55 +03:00
George Hotz
46a36a838a small dtype shapes fixups (#15984) 2026-04-29 19:40:38 -07:00
chenyu
b73248958a minor rand cleanups (#15982) 2026-04-29 22:22:29 -04:00
chenyu
53a28bafbd rand device seed to its own function (#15979) 2026-04-29 17:21:40 -04:00
Christopher Milan
d07741f1d7 am: look for firmware in /lib/firmware/amdgpu (#15974) 2026-04-29 17:15:09 -04:00
nimlgen
c73e667fc0 remove if for precompiled programs (#15980) 2026-04-29 23:43:36 +03:00
qazal
55915584e5 viz: fix cfg for emulated amd on the null device (#15976)
* simple failing when i test it end to end

* pass

* linter

* assemble
2026-04-30 05:18:09 +09:00
nimlgen
dfd2d07005 remove CompiledRunner (#15970)
* rm usage of CompiledRunner

* more tests

* last

* linter

* sink

* remove

* linter
2026-04-29 22:45:48 +03:00
wozeparrot
0080489abe llama: use env vars (#15978) 2026-04-29 12:37:15 -07:00
qazal
a37b605523 remove arch from asm kernel class (#15977)
* rm arch from kernel

* update other tests

* update abstractions4.py
2026-04-30 03:39:52 +09:00
Christopher Milan
7a79c2948a DEV visible device filter supports hyphenated syntax (#15971) 2026-04-29 14:02:21 -04:00
Christopher Milan
6b9a45568c autogen: better version handling for llvm and libclang (#15975) 2026-04-29 14:01:33 -04:00