nimlgen
e5729935c6
time_call ( #16152 )
...
* time_call
* x
* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a
add Ops.SOURCE test ( #16155 )
...
* simple failing test
* raises
* change
2026-05-12 22:49:32 +09:00
qazal
5cd0494b14
viz: canonicalize ast for schedule to codegen linking ( #16154 )
...
* simple failing test
* always null device
* viz: canonicalize ast for schedule to codegen linking
* SCACHE
2026-05-12 22:40:21 +09:00
chenyu
09fd80fba6
fix randperm and _multi_like drop requires_grad ( #16150 )
2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7
Update the spec in spec.py to match the current state ( #16132 )
...
* start work on specv2
* more spec
* more spec
* fix amd emulator
* more spec
* more
* fix test_uop_graph
* move those
* spec=2
* skip those questionable tests
* ptx fix
* more spec=2
* store
* allow custom function in tensor
* spec 2
* fix beam search for tensor cores
* delete the old specs
* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66
fix wrong kwargs passed into rands ( #16149 )
...
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Christopher Milan
039d84ff02
Revert "onnx: deduplicate simple proto parsers" ( #16148 )
...
This reverts commit 83eaefcd0f .
2026-05-11 21:45:17 -04:00
chenyu
63c1f00b80
disable test_svd_general again ( #16146 )
...
flaky on CI
2026-05-11 19:24:32 -04:00
chenyu
0b02fb6797
Revert "[pr] match torch rmsnorm ( #16122 )" ( #16144 )
...
This reverts commit 692257dd70 .
2026-05-11 17:53:42 -04:00
chenyu
fbe8be0b8b
style cleanup to Tensor.qr and svd ( #16142 )
...
* style cleanup to Tensor.qr and svd
same kernels
* more
* enable
2026-05-11 17:16:59 -04:00
Joshua James Venter
692257dd70
[pr] match torch rmsnorm ( #16122 )
...
* [pr] match rmsnorm torch
Signed-off-by: Joshua James Venter <venter.joshua@gmail.com >
* 1e-5
* ops.md
---------
Signed-off-by: Joshua James Venter <venter.joshua@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-05-11 14:36:41 -04:00
nimlgen
70c2480e71
hcq2 to extra ( #16126 )
...
* hcq2 in extra
* correct
* some revert from non-extra
* cln
* cpu
* x
* attach
* min
* remove attach
* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c
get_buf() for Buffer ( #16134 )
...
* p
* mypy
* x
2026-05-11 16:36:14 +03:00
qazal
2dd84416bf
viz/cli: schedule renderer ( #16101 )
...
* simpler steps
* work
* work
* iterate
* faster
* better
* simplify more
* sys stdin
* less
* work
* work and mv
* better
* seen bufs
* all call graphs
* print query
* ux
* param to buffer / buffer_view
* work
* respect NO_COLOR in uop_to_json
* less
* render uops
* rm custom renderer
* call can't pyrender.
* unrelated diff
* assert
* 5
2026-05-11 01:56:16 +09:00
George Hotz
daed602569
rename BUFFERIZE to STAGE ( #16125 )
2026-05-10 09:26:46 -07:00
qazal
39ce780907
viz/cli: emit all runs of selected kernel, json fixes ( #16124 )
...
* keep print
* --json in tests, sqtt --json err
* work
* import
* less
* line
2026-05-10 21:45:51 +09:00
qazal
51c7dafb0d
split viz cli test helpers ( #16123 )
2026-05-10 19:42:24 +09:00
Pawan
4dd6ad3514
gradient: add TRUNC backward ( #15925 )
...
* gradient: add TRUNC backward
* test: move round quantization gradient to test_ops
2026-05-08 16:27:55 -07:00
chenyu
235044c9d8
Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD ( #16093 )
...
* Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD
* ruff
2026-05-07 23:18:15 -04:00
June
83eaefcd0f
onnx: deduplicate simple proto parsers ( #16085 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-05-07 18:44:27 -07:00
wozeparrot
d11f4d0ec2
fix: don't copy on slice of DP weight ( #16089 )
2026-05-07 17:58:01 -07:00
George Hotz
b796bbae87
fix valid in indexing tests ( #16087 )
2026-05-07 14:11:28 -07:00
wozeparrot
4d1a9dca41
fix: don't copy precompiled custom kernel outputs ( #16084 )
2026-05-07 14:02:38 -07:00
chenyu
072db9924c
div to mixin ( #16078 )
...
also deleted idiv method
2026-05-07 12:52:37 -04:00
chenyu
516b00e286
mod and fmod to mixin ( #16077 )
2026-05-07 12:13:39 -04:00
chenyu
ef085304bc
stronger divmod_recombine ( #16066 )
2026-05-06 15:41:54 -04:00
chenyu
af4140f3be
fix divmod recombine for floordiv ( #16062 )
2026-05-06 14:22:42 -04:00
chenyu
c6ad3d3ac2
better divmod late rewrite ( #16061 )
...
better order
2026-05-06 11:31:48 -04:00
chenyu
aaabe42373
relax fold_divmod_general ( #16058 )
2026-05-05 21:37:56 -04:00
chenyu
869eae6b37
fix double div rewrites ( #16054 )
2026-05-05 19:34:35 -04:00
qazal
795501e1da
fix device in null graph events ( #16053 )
...
* failing test
* fix compute
* fix sdma
2026-05-06 07:44:08 +09:00
chenyu
34fe37d64e
use FLOORDIV and FLOORMOD ( #16048 )
...
* use FLOORDIV and FLOORMOD
also removed CORRECT_DIVMOD_FOLDING
* fix
* Revert "fix"
This reverts commit 86af33b88ef31943c61e67189b072eca4896409a.
* fix
* fix
2026-05-05 18:32:54 -04:00
nimlgen
5fa0016ffc
supports_exec_item -> supports_uop ( #16033 )
2026-05-05 22:41:13 +03:00
chenyu
9c37a0c75d
Ops.FLOORDIV and Ops.FLOORMOD ( #16038 )
...
* Ops.FLOORDIV and Ops.FLOORMOD
lowered into IDIV and MOD in get_late_rewrite_patterns
* still need this
* exclude
* like that?
2026-05-05 11:42:14 -04:00
Christopher Milan
1c8cb0769a
am: autogen asic_regs ( #16004 )
2026-05-04 22:52:07 -04:00
George Hotz
26406bed83
amd uses .valid, not index src valid ( #16042 )
2026-05-04 18:35:15 -07:00
Christopher Milan
8e99c4f097
fetch checks sha256 ( #16037 )
2026-05-04 16:08:38 -04:00
George Hotz
1884f67a39
simplify full_rewrite_to_sink spec ( #16035 )
...
* simplify full_rewrite_to_sink spec
* test cleanups
2026-05-04 11:44:13 -07:00
qazal
b1d88ebf02
viz/cli: aggregate flops in -t ( #16031 )
...
* 38
* plumbing
* more flops
* flop/s and bytes/s
* arithmetic mean
* tests
* harmonic mean
* range
* better
* simplify
* fix prints
* no string parsing needed
2026-05-04 17:35:02 +03:00
qazal
c02e390c2b
viz: encode flops, mem and metadata in json ( #16032 )
...
* gate print
* update everywhere to check path
* server encodes json
* ui changes
* cli changes
* tests never need regex
* no str replace
* update test_pipes
* remove that
2026-05-04 23:06:18 +09:00
bigyoshi
4024d8438f
runtime/graph: avoid core_id runtimevar merge conflicts ( #16026 )
...
Co-authored-by: bigyoshi51 <269989564+bigyoshi51@users.noreply.github.com >
2026-05-03 19:16:02 +03:00
qazal
9684334dfe
viz: fix flops in graph, add null graph tracing ( #16024 )
...
* min repro, todos
* null graph tracing
* work
* work
* work
* only test_flops
* exec points back
* first
* better
* integral timestamps maybe
* cleanup
* simpler, update NULL to use SDMA naming
* integration test
* sdma
2026-05-03 22:32:44 +09:00
wozeparrot
419d525553
feat: handle multioutput kernel grads ( #16028 )
2026-05-02 22:31:45 -07:00
qazal
7daf4b7d52
viz: split cli test ( #16015 )
...
* viz: split cli test
* arg3 is msg
2026-05-03 01:47:11 +09:00
chenyu
782d1ff80f
Tensor.fmod ( #16014 )
...
c-style mod matches torch
2026-05-01 16:02:18 -04:00
qazal
8b147a9ed5
minimal repro for llama copies 2 ( #16011 )
2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b
Revert "cleanup: untrack wait Metal buffers ( #15954 )" ( #16010 )
...
* Revert "cleanup: untrack wait Metal buffers (#15954 )"
This reverts commit 5eb1fd5d3c .
* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7
metal synchronize regression test ( #16008 )
...
* add test for metal wait=True
* add self.assertRaises
2026-05-01 20:10:57 +09:00
nimlgen
f6d92b55e6
am: use per pipe reset for gfx11+ ( #16006 )
2026-05-01 12:56:43 +03:00
George Hotz
4506688285
split render to render.py ( #16002 )
...
* split render to render.py
* move more print
2026-04-30 19:41:14 -07:00