Commit Graph

5694 Commits

Author SHA1 Message Date
nimlgen
e5729935c6 time_call (#16152)
* time_call

* x

* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a add Ops.SOURCE test (#16155)
* simple failing test

* raises

* change
2026-05-12 22:49:32 +09:00
qazal
5cd0494b14 viz: canonicalize ast for schedule to codegen linking (#16154)
* simple failing test

* always null device

* viz: canonicalize ast for schedule to codegen linking

* SCACHE
2026-05-12 22:40:21 +09:00
chenyu
09fd80fba6 fix randperm and _multi_like drop requires_grad (#16150) 2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7 Update the spec in spec.py to match the current state (#16132)
* start work on specv2

* more spec

* more spec

* fix amd emulator

* more spec

* more

* fix test_uop_graph

* move those

* spec=2

* skip those questionable tests

* ptx fix

* more spec=2

* store

* allow custom function in tensor

* spec 2

* fix beam search for tensor cores

* delete the old specs

* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66 fix wrong kwargs passed into rands (#16149)
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Christopher Milan
039d84ff02 Revert "onnx: deduplicate simple proto parsers" (#16148)
This reverts commit 83eaefcd0f.
2026-05-11 21:45:17 -04:00
chenyu
63c1f00b80 disable test_svd_general again (#16146)
flaky on CI
2026-05-11 19:24:32 -04:00
chenyu
0b02fb6797 Revert "[pr] match torch rmsnorm (#16122)" (#16144)
This reverts commit 692257dd70.
2026-05-11 17:53:42 -04:00
chenyu
fbe8be0b8b style cleanup to Tensor.qr and svd (#16142)
* style cleanup to Tensor.qr and svd

same kernels

* more

* enable
2026-05-11 17:16:59 -04:00
Joshua James Venter
692257dd70 [pr] match torch rmsnorm (#16122)
* [pr] match rmsnorm torch

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>

* 1e-5

* ops.md

---------

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-05-11 14:36:41 -04:00
nimlgen
70c2480e71 hcq2 to extra (#16126)
* hcq2 in extra

* correct

* some revert from non-extra

* cln

* cpu

* x

* attach

* min

* remove attach

* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c get_buf() for Buffer (#16134)
* p

* mypy

* x
2026-05-11 16:36:14 +03:00
qazal
2dd84416bf viz/cli: schedule renderer (#16101)
* simpler steps

* work

* work

* iterate

* faster

* better

* simplify more

* sys stdin

* less

* work

* work and mv

* better

* seen bufs

* all call graphs

* print query

* ux

* param to buffer / buffer_view

* work

* respect NO_COLOR in uop_to_json

* less

* render uops

* rm custom renderer

* call can't pyrender.

* unrelated diff

* assert

* 5
2026-05-11 01:56:16 +09:00
George Hotz
daed602569 rename BUFFERIZE to STAGE (#16125) 2026-05-10 09:26:46 -07:00
qazal
39ce780907 viz/cli: emit all runs of selected kernel, json fixes (#16124)
* keep print

* --json in tests, sqtt --json err

* work

* import

* less

* line
2026-05-10 21:45:51 +09:00
qazal
51c7dafb0d split viz cli test helpers (#16123) 2026-05-10 19:42:24 +09:00
Pawan
4dd6ad3514 gradient: add TRUNC backward (#15925)
* gradient: add TRUNC backward

* test: move round quantization gradient to test_ops
2026-05-08 16:27:55 -07:00
chenyu
235044c9d8 Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD (#16093)
* Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD

* ruff
2026-05-07 23:18:15 -04:00
June
83eaefcd0f onnx: deduplicate simple proto parsers (#16085)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-07 18:44:27 -07:00
wozeparrot
d11f4d0ec2 fix: don't copy on slice of DP weight (#16089) 2026-05-07 17:58:01 -07:00
George Hotz
b796bbae87 fix valid in indexing tests (#16087) 2026-05-07 14:11:28 -07:00
wozeparrot
4d1a9dca41 fix: don't copy precompiled custom kernel outputs (#16084) 2026-05-07 14:02:38 -07:00
chenyu
072db9924c div to mixin (#16078)
also deleted idiv method
2026-05-07 12:52:37 -04:00
chenyu
516b00e286 mod and fmod to mixin (#16077) 2026-05-07 12:13:39 -04:00
chenyu
ef085304bc stronger divmod_recombine (#16066) 2026-05-06 15:41:54 -04:00
chenyu
af4140f3be fix divmod recombine for floordiv (#16062) 2026-05-06 14:22:42 -04:00
chenyu
c6ad3d3ac2 better divmod late rewrite (#16061)
better order
2026-05-06 11:31:48 -04:00
chenyu
aaabe42373 relax fold_divmod_general (#16058) 2026-05-05 21:37:56 -04:00
chenyu
869eae6b37 fix double div rewrites (#16054) 2026-05-05 19:34:35 -04:00
qazal
795501e1da fix device in null graph events (#16053)
* failing test

* fix compute

* fix sdma
2026-05-06 07:44:08 +09:00
chenyu
34fe37d64e use FLOORDIV and FLOORMOD (#16048)
* use FLOORDIV and FLOORMOD

also removed CORRECT_DIVMOD_FOLDING

* fix

* Revert "fix"

This reverts commit 86af33b88ef31943c61e67189b072eca4896409a.

* fix

* fix
2026-05-05 18:32:54 -04:00
nimlgen
5fa0016ffc supports_exec_item -> supports_uop (#16033) 2026-05-05 22:41:13 +03:00
chenyu
9c37a0c75d Ops.FLOORDIV and Ops.FLOORMOD (#16038)
* Ops.FLOORDIV and Ops.FLOORMOD

lowered into IDIV and MOD in get_late_rewrite_patterns

* still need this

* exclude

* like that?
2026-05-05 11:42:14 -04:00
Christopher Milan
1c8cb0769a am: autogen asic_regs (#16004) 2026-05-04 22:52:07 -04:00
George Hotz
26406bed83 amd uses .valid, not index src valid (#16042) 2026-05-04 18:35:15 -07:00
Christopher Milan
8e99c4f097 fetch checks sha256 (#16037) 2026-05-04 16:08:38 -04:00
George Hotz
1884f67a39 simplify full_rewrite_to_sink spec (#16035)
* simplify full_rewrite_to_sink spec

* test cleanups
2026-05-04 11:44:13 -07:00
qazal
b1d88ebf02 viz/cli: aggregate flops in -t (#16031)
* 38

* plumbing

* more flops

* flop/s and bytes/s

* arithmetic mean

* tests

* harmonic mean

* range

* better

* simplify

* fix prints

* no string parsing needed
2026-05-04 17:35:02 +03:00
qazal
c02e390c2b viz: encode flops, mem and metadata in json (#16032)
* gate print

* update everywhere to check path

* server encodes json

* ui changes

* cli changes

* tests never need regex

* no str replace

* update test_pipes

* remove that
2026-05-04 23:06:18 +09:00
bigyoshi
4024d8438f runtime/graph: avoid core_id runtimevar merge conflicts (#16026)
Co-authored-by: bigyoshi51 <269989564+bigyoshi51@users.noreply.github.com>
2026-05-03 19:16:02 +03:00
qazal
9684334dfe viz: fix flops in graph, add null graph tracing (#16024)
* min repro, todos

* null graph tracing

* work

* work

* work

* only test_flops

* exec points back

* first

* better

* integral timestamps maybe

* cleanup

* simpler, update NULL to use SDMA naming

* integration test

* sdma
2026-05-03 22:32:44 +09:00
wozeparrot
419d525553 feat: handle multioutput kernel grads (#16028) 2026-05-02 22:31:45 -07:00
qazal
7daf4b7d52 viz: split cli test (#16015)
* viz: split cli test

* arg3 is msg
2026-05-03 01:47:11 +09:00
chenyu
782d1ff80f Tensor.fmod (#16014)
c-style mod matches torch
2026-05-01 16:02:18 -04:00
qazal
8b147a9ed5 minimal repro for llama copies 2 (#16011) 2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b Revert "cleanup: untrack wait Metal buffers (#15954)" (#16010)
* Revert "cleanup: untrack wait Metal buffers (#15954)"

This reverts commit 5eb1fd5d3c.

* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7 metal synchronize regression test (#16008)
* add test for metal wait=True

* add self.assertRaises
2026-05-01 20:10:57 +09:00
nimlgen
f6d92b55e6 am: use per pipe reset for gfx11+ (#16006) 2026-05-01 12:56:43 +03:00
George Hotz
4506688285 split render to render.py (#16002)
* split render to render.py

* move more print
2026-04-30 19:41:14 -07:00