Commit Graph

226 Commits

Author SHA1 Message Date
chenyu
f147791105 update test to reset and test kernel_count directly (#14832) 2026-02-17 11:48:46 -05:00
chenyu
f2f039cc0f fix chained full-buffer assign (#14828)
this shows issue that pm_remove_bufferize drops tags, will fix in bufferize next. this also fixed rand being different in jit vs no-jit
2026-02-17 09:11:04 -05:00
George Hotz
ff60dab622 Revert "big sink is on base (#14819)" (#14825)
This reverts commit 5fc3d8109f.
2026-02-17 19:18:06 +08:00
George Hotz
5fc3d8109f big sink is on base (#14819)
* big sink is on base

* contiguous fixes tests
2026-02-17 18:32:56 +08:00
qazal
f590564bf7 gemm multiple is only for cdna4 asm (#14814)
* gemm multiple is only for cdna4 asm

* move to backend

* and arch

* path
2026-02-17 14:00:02 +09:00
chenyu
f290af6c7d test_schedule always test with SPLIT_REDUCEOP=0 (#14802)
* test_schedule always test with SPLIT_REDUCEOP=0

except tests that tests SPLIT_REDUCEOP=1

* like that
2026-02-16 15:30:26 -05:00
Nicolas Pinto
20b658b786 fuse MULACC after MUL->SHL (#14788)
* decompositions: fuse (x << n) + c to MULACC

MUL→SHL converts x*(2^n) to x<<n before MULACC can fuse (x*c)+y.
Add pattern to also fuse (x<<n)+c → MULACC(x, 2^n, c) for backends
that support both MULACC and SHL.

* test: add test_mulacc_shl for SHL->MULACC fusion

* test: relax test_mulacc_unrolled to >= 4

SHL->MULACC fusion now also catches power-of-2 address calculations,
increasing MULACC count from 4 to 6 on PTX. the test's intent is that
each unrolled multiply is individually fused (not grouped), so >= 4
is the correct assertion.

---------

Co-authored-by: Prithvish <deformercoding@gmail.com>
Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com>
Co-authored-by: Nicolas Pinto <npinto@mbp23.local>
2026-02-16 16:26:44 +08:00
qazal
8e7c5f5b09 remove Tensor.training = True in test_arange (#14781) 2026-02-16 11:19:42 +09:00
qazal
156b6cb7e4 native bf16 cast in cdna4 (#14574)
* native bf16 cast in cdna4

* don't need contig backward

* simpler

* contig bw still wins in those cases
2026-02-16 10:51:32 +09:00
chenyu
352845d8cc update cast to uint tests (#14768)
result in valid range should work, add intermediate cast to NIRRenderer since it's UB for [128, 256)
2026-02-15 10:55:13 -05:00
qazal
ceccc8eb86 unskip now passing multi tests [pr] (#14759) 2026-02-15 20:30:00 +09:00
qazal
42b6bf0b7a fix sdpa causal failing test on multi (#14762)
* simple failing test

* device is from xq
2026-02-15 16:54:33 +09:00
George Hotz
0e215c433d remove hack from cast (#14760)
* remove hack from cast

* skip tests

* linters to 3.12, another skip

* fix rand

* m_
2026-02-15 13:56:38 +08:00
George Hotz
d176af6269 start outerworld call test, fix gate (#14758) 2026-02-15 12:35:01 +08:00
chenyu
ca68037f26 lazy basic setitem to unrealized Tensor (#14756)
undo the view and make it a mask, this fuses the setitem with any pending compute too.

one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
chenyu
95f4c7e90a fix limit_bufs to not limit index (#14751)
index is not real buffer. also made MAX_KERNEL_BUFFERS a ContextVar
2026-02-14 16:00:03 -05:00
chenyu
8f6772fd8c more setitem kernel mem tests (#14749)
* more setitem kernel mem tests

test only the slice is accessed

* update
2026-02-14 11:01:03 -05:00
chenyu
446909fb7a more setitem kernel tests (#14748)
check where realize happened
2026-02-14 09:57:46 -05:00
Christopher Milan
eaa9506a00 disallow subnormals in emulated test_dtype (#14744) 2026-02-14 00:11:57 -05:00
chenyu
dca7819f76 more setitem into unrealized tests (#14737)
* more setitem into unrealized tests

into empty, const with alu, and arange

* typo
2026-02-13 20:28:51 -05:00
chenyu
8b205a007e lazy setitem for realized target (#14735) 2026-02-13 12:20:14 -05:00
Christopher Milan
08a555c875 skip test_expand_buffer_before_cast on WEBGPU metal (#14724) 2026-02-13 00:01:05 -05:00
Christopher Milan
c30bb0f006 fix WEBGPU isnan check (#14711) 2026-02-12 17:01:18 -05:00
nimlgen
b376bd7a21 jit: fix raw in same kernel (#14699)
* jit: fix raw in same kernel

* fix

* ugh

* x

* simpler
2026-02-12 15:33:32 +03:00
George Hotz
095a064ba8 test.yml explicitly says backend (#14700)
* test.yml explicitly says backend

* 1e-5
2026-02-12 16:03:44 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00