George Hotz
a120709671
tighten shape spec for broadcasting ( #16206 )
...
* tighten shape spec for broadcasting
* use IndexError, not ValueError
* needs size
2026-05-18 22:12:04 -07:00
George Hotz
3f2d401464
all tests pass with NOOPT=1 ( #16257 )
...
* all tests pass with NOOPT=1
* fix a few more
* noopt 100% pass
* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
e694d7f222
more deviceless const prerequisites [pr] ( #16256 )
...
* more deviceless const prerequisites [pr]
* remove that
* arange.contiguous -> arange.clone in tests
arange will become deviceless const soon, update tests where it needs to be a buffer
2026-05-18 23:14:12 -04:00
chenyu
8631b6f17d
remove use of requires_grad in test/ ( #16237 )
2026-05-16 17:21:07 -04:00
chenyu
d62c1d83c0
remove Tensor.eye override ( #16219 )
...
* remove Tensor.eye override
was only needed for requires_grad arg
* README
2026-05-15 15:40:34 -04:00
George Hotz
83ec66da34
fix a fastdiv edge case ( #16199 )
2026-05-14 13:12:18 -07:00
George Hotz
3b8cc31759
disable fast idiv by default, it's broken ( #16197 )
...
* disable fast idiv by default, it's broken
* fix fast idiv tests
2026-05-14 11:48:27 -07:00
chenyu
32138c2418
svd to mixin ( #16175 )
2026-05-12 22:29:01 -04:00
chenyu
420a08c6d1
qr to mixin ( #16173 )
2026-05-12 21:23:25 -04:00
chenyu
25583f6dc1
fix cumsum dtype for 0d input ( #16164 )
2026-05-12 14:18:08 -04:00
qazal
5cd0494b14
viz: canonicalize ast for schedule to codegen linking ( #16154 )
...
* simple failing test
* always null device
* viz: canonicalize ast for schedule to codegen linking
* SCACHE
2026-05-12 22:40:21 +09:00
George Hotz
8294d105a7
Update the spec in spec.py to match the current state ( #16132 )
...
* start work on specv2
* more spec
* more spec
* fix amd emulator
* more spec
* more
* fix test_uop_graph
* move those
* spec=2
* skip those questionable tests
* ptx fix
* more spec=2
* store
* allow custom function in tensor
* spec 2
* fix beam search for tensor cores
* delete the old specs
* fix import
2026-05-11 20:07:47 -07:00
qazal
2dd84416bf
viz/cli: schedule renderer ( #16101 )
...
* simpler steps
* work
* work
* iterate
* faster
* better
* simplify more
* sys stdin
* less
* work
* work and mv
* better
* seen bufs
* all call graphs
* print query
* ux
* param to buffer / buffer_view
* work
* respect NO_COLOR in uop_to_json
* less
* render uops
* rm custom renderer
* call can't pyrender.
* unrelated diff
* assert
* 5
2026-05-11 01:56:16 +09:00
George Hotz
daed602569
rename BUFFERIZE to STAGE ( #16125 )
2026-05-10 09:26:46 -07:00
qazal
39ce780907
viz/cli: emit all runs of selected kernel, json fixes ( #16124 )
...
* keep print
* --json in tests, sqtt --json err
* work
* import
* less
* line
2026-05-10 21:45:51 +09:00
qazal
51c7dafb0d
split viz cli test helpers ( #16123 )
2026-05-10 19:42:24 +09:00
chenyu
235044c9d8
Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD ( #16093 )
...
* Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD
* ruff
2026-05-07 23:18:15 -04:00
wozeparrot
d11f4d0ec2
fix: don't copy on slice of DP weight ( #16089 )
2026-05-07 17:58:01 -07:00
George Hotz
b796bbae87
fix valid in indexing tests ( #16087 )
2026-05-07 14:11:28 -07:00
chenyu
072db9924c
div to mixin ( #16078 )
...
also deleted idiv method
2026-05-07 12:52:37 -04:00
chenyu
516b00e286
mod and fmod to mixin ( #16077 )
2026-05-07 12:13:39 -04:00
chenyu
ef085304bc
stronger divmod_recombine ( #16066 )
2026-05-06 15:41:54 -04:00
chenyu
af4140f3be
fix divmod recombine for floordiv ( #16062 )
2026-05-06 14:22:42 -04:00
chenyu
c6ad3d3ac2
better divmod late rewrite ( #16061 )
...
better order
2026-05-06 11:31:48 -04:00
chenyu
aaabe42373
relax fold_divmod_general ( #16058 )
2026-05-05 21:37:56 -04:00
chenyu
869eae6b37
fix double div rewrites ( #16054 )
2026-05-05 19:34:35 -04:00
qazal
795501e1da
fix device in null graph events ( #16053 )
...
* failing test
* fix compute
* fix sdma
2026-05-06 07:44:08 +09:00
chenyu
34fe37d64e
use FLOORDIV and FLOORMOD ( #16048 )
...
* use FLOORDIV and FLOORMOD
also removed CORRECT_DIVMOD_FOLDING
* fix
* Revert "fix"
This reverts commit 86af33b88ef31943c61e67189b072eca4896409a.
* fix
* fix
2026-05-05 18:32:54 -04:00
chenyu
9c37a0c75d
Ops.FLOORDIV and Ops.FLOORMOD ( #16038 )
...
* Ops.FLOORDIV and Ops.FLOORMOD
lowered into IDIV and MOD in get_late_rewrite_patterns
* still need this
* exclude
* like that?
2026-05-05 11:42:14 -04:00
Christopher Milan
8e99c4f097
fetch checks sha256 ( #16037 )
2026-05-04 16:08:38 -04:00
George Hotz
1884f67a39
simplify full_rewrite_to_sink spec ( #16035 )
...
* simplify full_rewrite_to_sink spec
* test cleanups
2026-05-04 11:44:13 -07:00
qazal
b1d88ebf02
viz/cli: aggregate flops in -t ( #16031 )
...
* 38
* plumbing
* more flops
* flop/s and bytes/s
* arithmetic mean
* tests
* harmonic mean
* range
* better
* simplify
* fix prints
* no string parsing needed
2026-05-04 17:35:02 +03:00
qazal
c02e390c2b
viz: encode flops, mem and metadata in json ( #16032 )
...
* gate print
* update everywhere to check path
* server encodes json
* ui changes
* cli changes
* tests never need regex
* no str replace
* update test_pipes
* remove that
2026-05-04 23:06:18 +09:00
qazal
9684334dfe
viz: fix flops in graph, add null graph tracing ( #16024 )
...
* min repro, todos
* null graph tracing
* work
* work
* work
* only test_flops
* exec points back
* first
* better
* integral timestamps maybe
* cleanup
* simpler, update NULL to use SDMA naming
* integration test
* sdma
2026-05-03 22:32:44 +09:00
qazal
7daf4b7d52
viz: split cli test ( #16015 )
...
* viz: split cli test
* arg3 is msg
2026-05-03 01:47:11 +09:00
George Hotz
0f7e296f5b
fix some indexing edge cases ( #15988 )
2026-04-30 08:05:30 -07:00
qazal
55915584e5
viz: fix cfg for emulated amd on the null device ( #15976 )
...
* simple failing when i test it end to end
* pass
* linter
* assemble
2026-04-30 05:18:09 +09:00
qazal
a37b605523
remove arch from asm kernel class ( #15977 )
...
* rm arch from kernel
* update other tests
* update abstractions4.py
2026-04-30 03:39:52 +09:00
chenyu
654e611a29
_bits_to_rand to mixin ( #15972 )
2026-04-29 13:47:25 -04:00
George Hotz
5f441ecffc
unify reduce + reduce_axis ( #15973 )
...
* unify reduce + reduce_axis
* fix all tests
* lil cleanups
2026-04-29 10:29:56 -07:00
nimlgen
7787f76dcc
get_runner -> get_runtime ( #15967 )
...
* get_runner -> get_runtime
* do not use get_runner
* fix
* remove get_tunner
* remove
* fix
* x
2026-04-29 18:29:49 +03:00
chenyu
fb188c3c23
UOp.bitcast noop early return ( #15968 )
...
matches Tensor
2026-04-29 09:41:40 -04:00
chenyu
c4bea54e9c
_threefry_random_bits to mixin ( #15959 )
...
start RandMixin
2026-04-28 19:13:57 -04:00
chenyu
77f9125c21
move Tensor.pad to OpMixin ( #15946 )
2026-04-27 16:56:04 -04:00
nimlgen
4164666c72
programinfo ( #15942 )
...
* programinfo
* fix
* m
* x
* x
* changes
* x
* fix
* rm
2026-04-27 23:12:03 +03:00
chenyu
fe38d6de94
_pad_circular and _pad_reflect_replicate to mixin ( #15944 )
2026-04-27 16:07:05 -04:00
nimlgen
bb652352c7
remove execitem ( #15932 )
...
* remove execitem
* f
* x
2026-04-25 19:33:04 +03:00
qazal
9a23de7d27
viz/cli: unify profile and rewrites, -s ALL default ( #15931 )
...
* work
* workg
* better
* cleanup
* better defaults
* --ls
* better
* work
* update llama
* update
2026-04-25 22:31:24 +09:00
nimlgen
a5e9ea7a60
remove schedule batch 4 ( #15927 )
...
* remove schedule batch 4
* fini
2026-04-25 12:36:55 +03:00
nimlgen
3c8a2db870
remove schedule() from tests batch 2 ( #15923 )
...
* remove schedule() from tests batch 2
* batch 4
2026-04-25 10:44:41 +03:00