Christopher Milan
8e99c4f097
fetch checks sha256 ( #16037 )
2026-05-04 16:08:38 -04:00
George Hotz
1884f67a39
simplify full_rewrite_to_sink spec ( #16035 )
...
* simplify full_rewrite_to_sink spec
* test cleanups
2026-05-04 11:44:13 -07:00
qazal
b1d88ebf02
viz/cli: aggregate flops in -t ( #16031 )
...
* 38
* plumbing
* more flops
* flop/s and bytes/s
* arithmetic mean
* tests
* harmonic mean
* range
* better
* simplify
* fix prints
* no string parsing needed
2026-05-04 17:35:02 +03:00
qazal
c02e390c2b
viz: encode flops, mem and metadata in json ( #16032 )
...
* gate print
* update everywhere to check path
* server encodes json
* ui changes
* cli changes
* tests never need regex
* no str replace
* update test_pipes
* remove that
2026-05-04 23:06:18 +09:00
bigyoshi
4024d8438f
runtime/graph: avoid core_id runtimevar merge conflicts ( #16026 )
...
Co-authored-by: bigyoshi51 <269989564+bigyoshi51@users.noreply.github.com >
2026-05-03 19:16:02 +03:00
qazal
9684334dfe
viz: fix flops in graph, add null graph tracing ( #16024 )
...
* min repro, todos
* null graph tracing
* work
* work
* work
* only test_flops
* exec points back
* first
* better
* integral timestamps maybe
* cleanup
* simpler, update NULL to use SDMA naming
* integration test
* sdma
2026-05-03 22:32:44 +09:00
wozeparrot
419d525553
feat: handle multioutput kernel grads ( #16028 )
2026-05-02 22:31:45 -07:00
qazal
7daf4b7d52
viz: split cli test ( #16015 )
...
* viz: split cli test
* arg3 is msg
2026-05-03 01:47:11 +09:00
chenyu
782d1ff80f
Tensor.fmod ( #16014 )
...
c-style mod matches torch
2026-05-01 16:02:18 -04:00
qazal
8b147a9ed5
minimal repro for llama copies 2 ( #16011 )
2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b
Revert "cleanup: untrack wait Metal buffers ( #15954 )" ( #16010 )
...
* Revert "cleanup: untrack wait Metal buffers (#15954 )"
This reverts commit 5eb1fd5d3c .
* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7
metal synchronize regression test ( #16008 )
...
* add test for metal wait=True
* add self.assertRaises
2026-05-01 20:10:57 +09:00
nimlgen
f6d92b55e6
am: use per pipe reset for gfx11+ ( #16006 )
2026-05-01 12:56:43 +03:00
George Hotz
4506688285
split render to render.py ( #16002 )
...
* split render to render.py
* move more print
2026-04-30 19:41:14 -07:00
chenyu
52c92e15ae
no replacement multinomial ( #15995 )
...
* no replacement multinomial
Efraimidis–Spirakis
* num_samples == 1 can use fast path
2026-04-30 17:35:26 -04:00
chenyu
e0b09f288f
input validation for rand functions ( #15990 )
2026-04-30 14:00:44 -04:00
nimlgen
11e1a2b89f
cleaner and faster run_linear ( #15987 )
...
* cleaner and faster run_linear
* x
* assert for now
* x
* x
* sym_infer
* remove sink
2026-04-30 20:15:22 +03:00
qazal
58b34e71bd
failing test for llama useless copies ( #15989 )
2026-05-01 00:55:29 +09:00
George Hotz
0f7e296f5b
fix some indexing edge cases ( #15988 )
2026-04-30 08:05:30 -07:00
qazal
55915584e5
viz: fix cfg for emulated amd on the null device ( #15976 )
...
* simple failing when i test it end to end
* pass
* linter
* assemble
2026-04-30 05:18:09 +09:00
nimlgen
dfd2d07005
remove CompiledRunner ( #15970 )
...
* rm usage of CompiledRunner
* more tests
* last
* linter
* sink
* remove
* linter
2026-04-29 22:45:48 +03:00
qazal
a37b605523
remove arch from asm kernel class ( #15977 )
...
* rm arch from kernel
* update other tests
* update abstractions4.py
2026-04-30 03:39:52 +09:00
chenyu
654e611a29
_bits_to_rand to mixin ( #15972 )
2026-04-29 13:47:25 -04:00
George Hotz
5f441ecffc
unify reduce + reduce_axis ( #15973 )
...
* unify reduce + reduce_axis
* fix all tests
* lil cleanups
2026-04-29 10:29:56 -07:00
nimlgen
7787f76dcc
get_runner -> get_runtime ( #15967 )
...
* get_runner -> get_runtime
* do not use get_runner
* fix
* remove get_tunner
* remove
* fix
* x
2026-04-29 18:29:49 +03:00
chenyu
fb188c3c23
UOp.bitcast noop early return ( #15968 )
...
matches Tensor
2026-04-29 09:41:40 -04:00
chenyu
c4bea54e9c
_threefry_random_bits to mixin ( #15959 )
...
start RandMixin
2026-04-28 19:13:57 -04:00
Nino Risteski
5eb1fd5d3c
cleanup: untrack wait Metal buffers ( #15954 )
2026-04-28 12:54:59 -07:00
nimlgen
77965a22e5
local optimize as rewrite ( #15953 )
...
* local optimize as rewrite
* better
* x
* slighly rename
* fix
* ugh
* remove
* x
* remove
* not weak
2026-04-28 22:51:04 +03:00
qazal
54f00e1013
sqtt: correct rdna4 structs ( #15948 )
2026-04-28 07:35:50 +09:00
qazal
c58fd85a99
sqtt: add needs_rocprof decorator ( #15947 )
...
* sqtt: add needs_rocprof decorator
* version string
2026-04-28 06:22:50 +09:00
chenyu
77f9125c21
move Tensor.pad to OpMixin ( #15946 )
2026-04-27 16:56:04 -04:00
nimlgen
4164666c72
programinfo ( #15942 )
...
* programinfo
* fix
* m
* x
* x
* changes
* x
* fix
* rm
2026-04-27 23:12:03 +03:00
chenyu
fe38d6de94
_pad_circular and _pad_reflect_replicate to mixin ( #15944 )
2026-04-27 16:07:05 -04:00
qazal
8c174bdad4
viz/sqtt: correct exec pipes ( #15885 )
...
* wmma
* p2
* test
* left
* work
* pickle
* handwritten failing tests
* start work
* test the pipes
* empirical evidence
* update rdna4 enum types
* VALU pipe 1
* TRANSCENDENTAL pipe
* transcendental function units
* reorder
* wmma pipe
* cleanup and notes
* smaller
* work
* diff cleanup
* pickle
* use se:1
* int
2026-04-28 05:05:49 +09:00
nimlgen
96165ff0d1
validate_with_cpu as rewrite ( #15938 )
...
* validate_with_cpu as rewrite
* compil
* x
* linter
* moved
* fix
2026-04-26 19:58:53 +03:00
nimlgen
117e9e22dd
estimates from graph ( #15937 )
...
* estimates from graph
* test
* x
2026-04-26 18:22:53 +03:00
nimlgen
bb652352c7
remove execitem ( #15932 )
...
* remove execitem
* f
* x
2026-04-25 19:33:04 +03:00
nimlgen
e0ff6cc15c
remove old schedule ( #15930 )
...
* remove old schedule
* tests
* r
* x
2026-04-25 16:46:36 +03:00
qazal
9a23de7d27
viz/cli: unify profile and rewrites, -s ALL default ( #15931 )
...
* work
* workg
* better
* cleanup
* better defaults
* --ls
* better
* work
* update llama
* update
2026-04-25 22:31:24 +09:00
nimlgen
a5e9ea7a60
remove schedule batch 4 ( #15927 )
...
* remove schedule batch 4
* fini
2026-04-25 12:36:55 +03:00
nimlgen
d2ab6ea7a6
remove schedule batch 3 ( #15924 )
...
* remove shcedule batch 3
* batch 6
* batch 7
2026-04-25 11:53:16 +03:00
nimlgen
3c8a2db870
remove schedule() from tests batch 2 ( #15923 )
...
* remove schedule() from tests batch 2
* batch 4
2026-04-25 10:44:41 +03:00
Denys Melnyk
1fdcb13bfb
webgpu: fix weight lookup in export_model after compile_net key change ( #15919 )
...
* fix lookup site in export_model_webgpu after refactoring
webgpu (sd): fix export_model weight lookup after compile_net changes
fix lookup site in export_model_webgpu after refactoring
* add regression test
2026-04-25 10:04:55 +03:00
Christopher Milan
57fbaa3d49
amd: fallback to llvm when comgr is not available ( #15914 )
2026-04-24 23:30:16 -04:00
nimlgen
d3378010ee
schedule() -> schedule_linear() in tests (batch 1) ( #15915 )
...
* schedule_with_vars -> linear_with_vars in tests
* tests batch 1
* batch 2
* estimate_uop
* simpler
* rm
2026-04-24 23:40:53 +03:00
chenyu
b501ba3e42
nll_loss to mixin ( #15918 )
2026-04-24 15:50:31 -04:00
chenyu
2f9fdb4a37
scatter to mixin ( #15917 )
2026-04-24 15:37:37 -04:00
nimlgen
f2751955cb
remove linear_to_schedule from tests ( #15912 )
...
* remove linear_to_schedule from tests
* x
2026-04-24 20:02:10 +03:00
chenyu
03a7604f76
sort argsort topk allclose to mixin ( #15910 )
2026-04-24 10:20:46 -04:00