Commit Graph

5569 Commits

Author SHA1 Message Date
chenyu
ef085304bc stronger divmod_recombine (#16066) 2026-05-06 15:41:54 -04:00
chenyu
af4140f3be fix divmod recombine for floordiv (#16062) 2026-05-06 14:22:42 -04:00
chenyu
c6ad3d3ac2 better divmod late rewrite (#16061)
better order
2026-05-06 11:31:48 -04:00
chenyu
aaabe42373 relax fold_divmod_general (#16058) 2026-05-05 21:37:56 -04:00
chenyu
869eae6b37 fix double div rewrites (#16054) 2026-05-05 19:34:35 -04:00
qazal
795501e1da fix device in null graph events (#16053)
* failing test

* fix compute

* fix sdma
2026-05-06 07:44:08 +09:00
chenyu
34fe37d64e use FLOORDIV and FLOORMOD (#16048)
* use FLOORDIV and FLOORMOD

also removed CORRECT_DIVMOD_FOLDING

* fix

* Revert "fix"

This reverts commit 86af33b88ef31943c61e67189b072eca4896409a.

* fix

* fix
2026-05-05 18:32:54 -04:00
nimlgen
5fa0016ffc supports_exec_item -> supports_uop (#16033) 2026-05-05 22:41:13 +03:00
chenyu
9c37a0c75d Ops.FLOORDIV and Ops.FLOORMOD (#16038)
* Ops.FLOORDIV and Ops.FLOORMOD

lowered into IDIV and MOD in get_late_rewrite_patterns

* still need this

* exclude

* like that?
2026-05-05 11:42:14 -04:00
Christopher Milan
1c8cb0769a am: autogen asic_regs (#16004) 2026-05-04 22:52:07 -04:00
George Hotz
26406bed83 amd uses .valid, not index src valid (#16042) 2026-05-04 18:35:15 -07:00
Christopher Milan
8e99c4f097 fetch checks sha256 (#16037) 2026-05-04 16:08:38 -04:00
George Hotz
1884f67a39 simplify full_rewrite_to_sink spec (#16035)
* simplify full_rewrite_to_sink spec

* test cleanups
2026-05-04 11:44:13 -07:00
qazal
b1d88ebf02 viz/cli: aggregate flops in -t (#16031)
* 38

* plumbing

* more flops

* flop/s and bytes/s

* arithmetic mean

* tests

* harmonic mean

* range

* better

* simplify

* fix prints

* no string parsing needed
2026-05-04 17:35:02 +03:00
qazal
c02e390c2b viz: encode flops, mem and metadata in json (#16032)
* gate print

* update everywhere to check path

* server encodes json

* ui changes

* cli changes

* tests never need regex

* no str replace

* update test_pipes

* remove that
2026-05-04 23:06:18 +09:00
bigyoshi
4024d8438f runtime/graph: avoid core_id runtimevar merge conflicts (#16026)
Co-authored-by: bigyoshi51 <269989564+bigyoshi51@users.noreply.github.com>
2026-05-03 19:16:02 +03:00
qazal
9684334dfe viz: fix flops in graph, add null graph tracing (#16024)
* min repro, todos

* null graph tracing

* work

* work

* work

* only test_flops

* exec points back

* first

* better

* integral timestamps maybe

* cleanup

* simpler, update NULL to use SDMA naming

* integration test

* sdma
2026-05-03 22:32:44 +09:00
wozeparrot
419d525553 feat: handle multioutput kernel grads (#16028) 2026-05-02 22:31:45 -07:00
qazal
7daf4b7d52 viz: split cli test (#16015)
* viz: split cli test

* arg3 is msg
2026-05-03 01:47:11 +09:00
chenyu
782d1ff80f Tensor.fmod (#16014)
c-style mod matches torch
2026-05-01 16:02:18 -04:00
qazal
8b147a9ed5 minimal repro for llama copies 2 (#16011) 2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b Revert "cleanup: untrack wait Metal buffers (#15954)" (#16010)
* Revert "cleanup: untrack wait Metal buffers (#15954)"

This reverts commit 5eb1fd5d3c.

* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7 metal synchronize regression test (#16008)
* add test for metal wait=True

* add self.assertRaises
2026-05-01 20:10:57 +09:00
nimlgen
f6d92b55e6 am: use per pipe reset for gfx11+ (#16006) 2026-05-01 12:56:43 +03:00
George Hotz
4506688285 split render to render.py (#16002)
* split render to render.py

* move more print
2026-04-30 19:41:14 -07:00
chenyu
52c92e15ae no replacement multinomial (#15995)
* no replacement multinomial

Efraimidis–Spirakis

* num_samples == 1 can use fast path
2026-04-30 17:35:26 -04:00
chenyu
e0b09f288f input validation for rand functions (#15990) 2026-04-30 14:00:44 -04:00
nimlgen
11e1a2b89f cleaner and faster run_linear (#15987)
* cleaner and faster run_linear

* x

* assert for now

* x

* x

* sym_infer

* remove sink
2026-04-30 20:15:22 +03:00
qazal
58b34e71bd failing test for llama useless copies (#15989) 2026-05-01 00:55:29 +09:00
George Hotz
0f7e296f5b fix some indexing edge cases (#15988) 2026-04-30 08:05:30 -07:00
qazal
55915584e5 viz: fix cfg for emulated amd on the null device (#15976)
* simple failing when i test it end to end

* pass

* linter

* assemble
2026-04-30 05:18:09 +09:00
nimlgen
dfd2d07005 remove CompiledRunner (#15970)
* rm usage of CompiledRunner

* more tests

* last

* linter

* sink

* remove

* linter
2026-04-29 22:45:48 +03:00
qazal
a37b605523 remove arch from asm kernel class (#15977)
* rm arch from kernel

* update other tests

* update abstractions4.py
2026-04-30 03:39:52 +09:00
chenyu
654e611a29 _bits_to_rand to mixin (#15972) 2026-04-29 13:47:25 -04:00
George Hotz
5f441ecffc unify reduce + reduce_axis (#15973)
* unify reduce + reduce_axis

* fix all tests

* lil cleanups
2026-04-29 10:29:56 -07:00
nimlgen
7787f76dcc get_runner -> get_runtime (#15967)
* get_runner -> get_runtime

* do not use get_runner

* fix

* remove get_tunner

* remove

* fix

* x
2026-04-29 18:29:49 +03:00
chenyu
fb188c3c23 UOp.bitcast noop early return (#15968)
matches Tensor
2026-04-29 09:41:40 -04:00
chenyu
c4bea54e9c _threefry_random_bits to mixin (#15959)
start RandMixin
2026-04-28 19:13:57 -04:00
Nino Risteski
5eb1fd5d3c cleanup: untrack wait Metal buffers (#15954) 2026-04-28 12:54:59 -07:00
nimlgen
77965a22e5 local optimize as rewrite (#15953)
* local optimize as rewrite

* better

* x

* slighly rename

* fix

* ugh

* remove

* x

* remove

* not weak
2026-04-28 22:51:04 +03:00
qazal
54f00e1013 sqtt: correct rdna4 structs (#15948) 2026-04-28 07:35:50 +09:00
qazal
c58fd85a99 sqtt: add needs_rocprof decorator (#15947)
* sqtt: add needs_rocprof decorator

* version string
2026-04-28 06:22:50 +09:00
chenyu
77f9125c21 move Tensor.pad to OpMixin (#15946) 2026-04-27 16:56:04 -04:00
nimlgen
4164666c72 programinfo (#15942)
* programinfo

* fix

* m

* x

* x

* changes

* x

* fix

* rm
2026-04-27 23:12:03 +03:00
chenyu
fe38d6de94 _pad_circular and _pad_reflect_replicate to mixin (#15944) 2026-04-27 16:07:05 -04:00
qazal
8c174bdad4 viz/sqtt: correct exec pipes (#15885)
* wmma

* p2

* test

* left

* work

* pickle

* handwritten failing tests

* start work

* test the pipes

* empirical evidence

* update rdna4 enum types

* VALU pipe 1

* TRANSCENDENTAL pipe

* transcendental function units

* reorder

* wmma pipe

* cleanup and notes

* smaller

* work

* diff cleanup

* pickle

* use se:1

* int
2026-04-28 05:05:49 +09:00
nimlgen
96165ff0d1 validate_with_cpu as rewrite (#15938)
* validate_with_cpu as rewrite

* compil

* x

* linter

* moved

* fix
2026-04-26 19:58:53 +03:00
nimlgen
117e9e22dd estimates from graph (#15937)
* estimates from graph

* test

* x
2026-04-26 18:22:53 +03:00
nimlgen
bb652352c7 remove execitem (#15932)
* remove execitem

* f

* x
2026-04-25 19:33:04 +03:00
nimlgen
e0ff6cc15c remove old schedule (#15930)
* remove old schedule

* tests

* r

* x
2026-04-25 16:46:36 +03:00