chenyu
f147791105
update test to reset and test kernel_count directly ( #14832 )
2026-02-17 11:48:46 -05:00
chenyu
f2f039cc0f
fix chained full-buffer assign ( #14828 )
...
this shows issue that pm_remove_bufferize drops tags, will fix in bufferize next. this also fixed rand being different in jit vs no-jit
2026-02-17 09:11:04 -05:00
George Hotz
ff60dab622
Revert "big sink is on base ( #14819 )" ( #14825 )
...
This reverts commit 5fc3d8109f .
2026-02-17 19:18:06 +08:00
George Hotz
5fc3d8109f
big sink is on base ( #14819 )
...
* big sink is on base
* contiguous fixes tests
2026-02-17 18:32:56 +08:00
qazal
f590564bf7
gemm multiple is only for cdna4 asm ( #14814 )
...
* gemm multiple is only for cdna4 asm
* move to backend
* and arch
* path
2026-02-17 14:00:02 +09:00
chenyu
f290af6c7d
test_schedule always test with SPLIT_REDUCEOP=0 ( #14802 )
...
* test_schedule always test with SPLIT_REDUCEOP=0
except tests that tests SPLIT_REDUCEOP=1
* like that
2026-02-16 15:30:26 -05:00
Nicolas Pinto
20b658b786
fuse MULACC after MUL->SHL ( #14788 )
...
* decompositions: fuse (x << n) + c to MULACC
MUL→SHL converts x*(2^n) to x<<n before MULACC can fuse (x*c)+y.
Add pattern to also fuse (x<<n)+c → MULACC(x, 2^n, c) for backends
that support both MULACC and SHL.
* test: add test_mulacc_shl for SHL->MULACC fusion
* test: relax test_mulacc_unrolled to >= 4
SHL->MULACC fusion now also catches power-of-2 address calculations,
increasing MULACC count from 4 to 6 on PTX. the test's intent is that
each unrolled multiply is individually fused (not grouped), so >= 4
is the correct assertion.
---------
Co-authored-by: Prithvish <deformercoding@gmail.com >
Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com >
Co-authored-by: Nicolas Pinto <npinto@mbp23.local >
2026-02-16 16:26:44 +08:00
qazal
8e7c5f5b09
remove Tensor.training = True in test_arange ( #14781 )
2026-02-16 11:19:42 +09:00
qazal
156b6cb7e4
native bf16 cast in cdna4 ( #14574 )
...
* native bf16 cast in cdna4
* don't need contig backward
* simpler
* contig bw still wins in those cases
2026-02-16 10:51:32 +09:00
chenyu
352845d8cc
update cast to uint tests ( #14768 )
...
result in valid range should work, add intermediate cast to NIRRenderer since it's UB for [128, 256)
2026-02-15 10:55:13 -05:00
qazal
ceccc8eb86
unskip now passing multi tests [pr] ( #14759 )
2026-02-15 20:30:00 +09:00
qazal
42b6bf0b7a
fix sdpa causal failing test on multi ( #14762 )
...
* simple failing test
* device is from xq
2026-02-15 16:54:33 +09:00
George Hotz
0e215c433d
remove hack from cast ( #14760 )
...
* remove hack from cast
* skip tests
* linters to 3.12, another skip
* fix rand
* m_
2026-02-15 13:56:38 +08:00
George Hotz
d176af6269
start outerworld call test, fix gate ( #14758 )
2026-02-15 12:35:01 +08:00
chenyu
ca68037f26
lazy basic setitem to unrealized Tensor ( #14756 )
...
undo the view and make it a mask, this fuses the setitem with any pending compute too.
one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
chenyu
95f4c7e90a
fix limit_bufs to not limit index ( #14751 )
...
index is not real buffer. also made MAX_KERNEL_BUFFERS a ContextVar
2026-02-14 16:00:03 -05:00
chenyu
8f6772fd8c
more setitem kernel mem tests ( #14749 )
...
* more setitem kernel mem tests
test only the slice is accessed
* update
2026-02-14 11:01:03 -05:00
chenyu
446909fb7a
more setitem kernel tests ( #14748 )
...
check where realize happened
2026-02-14 09:57:46 -05:00
Christopher Milan
eaa9506a00
disallow subnormals in emulated test_dtype ( #14744 )
2026-02-14 00:11:57 -05:00
chenyu
dca7819f76
more setitem into unrealized tests ( #14737 )
...
* more setitem into unrealized tests
into empty, const with alu, and arange
* typo
2026-02-13 20:28:51 -05:00
chenyu
8b205a007e
lazy setitem for realized target ( #14735 )
2026-02-13 12:20:14 -05:00
Christopher Milan
08a555c875
skip test_expand_buffer_before_cast on WEBGPU metal ( #14724 )
2026-02-13 00:01:05 -05:00
Christopher Milan
c30bb0f006
fix WEBGPU isnan check ( #14711 )
2026-02-12 17:01:18 -05:00
nimlgen
b376bd7a21
jit: fix raw in same kernel ( #14699 )
...
* jit: fix raw in same kernel
* fix
* ugh
* x
* simpler
2026-02-12 15:33:32 +03:00
George Hotz
095a064ba8
test.yml explicitly says backend ( #14700 )
...
* test.yml explicitly says backend
* 1e-5
2026-02-12 16:03:44 +08:00
George Hotz
c331798201
move tests to test/backend ( #14691 )
...
* move tests to test/backend
* fix imports
* fix CI
* revert that one
* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00