Sachith Shetty
74567c1958
fix: pass input device to ONNX helper internal tensors ( #16242 )
...
* fix: pass input device to onnx methods internal tensors
* test: onnx helper internal tensors use input device
2026-05-19 11:16:33 -07:00
Christopher Milan
a178301dbe
PYTHONREMU: fix CDNA VOP3 conditional writes ( #16258 )
2026-05-19 13:31:31 -04:00
George Hotz
a120709671
tighten shape spec for broadcasting ( #16206 )
...
* tighten shape spec for broadcasting
* use IndexError, not ValueError
* needs size
2026-05-18 22:12:04 -07:00
George Hotz
3f2d401464
all tests pass with NOOPT=1 ( #16257 )
...
* all tests pass with NOOPT=1
* fix a few more
* noopt 100% pass
* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
e694d7f222
more deviceless const prerequisites [pr] ( #16256 )
...
* more deviceless const prerequisites [pr]
* remove that
* arange.contiguous -> arange.clone in tests
arange will become deviceless const soon, update tests where it needs to be a buffer
2026-05-18 23:14:12 -04:00
chenyu
c1076ed56c
Tensor.device and UOp.device can be None ( #16255 )
2026-05-18 22:08:10 -04:00
chenyu
d532b4f533
multi alu with deviceless const ( #16251 )
2026-05-18 19:31:53 -04:00
Christopher Milan
7515824a6d
ci: actually use clang-20, enable bfloat16 ( #16249 )
2026-05-18 19:06:43 -04:00
chenyu
754344087a
assign for deviceless const source ( #16248 )
2026-05-18 17:39:53 -04:00
chenyu
73e6b4963b
to and shard is noop for deviceless uop ( #16247 )
2026-05-18 16:11:10 -04:00
chenyu
db639ebe3e
deviceless const from UOp ( #16243 )
2026-05-18 14:14:12 -04:00
chenyu
5ae4dbd599
make slow tests faster ( #16244 )
2026-05-18 11:42:02 -04:00
chenyu
dcee90aa3f
remove requires_grad use in extra/examples ( #16238 )
...
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
chenyu
8631b6f17d
remove use of requires_grad in test/ ( #16237 )
2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050
do not gate backward on requires_grad ( #16230 )
...
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
qazal
ebcb7b7cc0
fp8 gemm tests with scale args ( #16231 )
...
* update atol
* update fp8 path
* more work
* update profile.sh
2026-05-16 20:47:58 +09:00
wozeparrot
2d48d7ab09
remove more invalid ( #16227 )
2026-05-16 02:52:27 -07:00
Christopher Milan
79c0ae5b89
metal: arch is GPU family ( #16223 )
2026-05-15 21:22:48 -04:00
chenyu
d62c1d83c0
remove Tensor.eye override ( #16219 )
...
* remove Tensor.eye override
was only needed for requires_grad arg
* README
2026-05-15 15:40:34 -04:00
chenyu
07a172dbbb
remove noop requires_grad_ calls ( #16213 )
2026-05-15 13:31:10 -04:00
chenyu
c6cf9e8f0c
remove test_svd_nonfull_5_5 ( #16217 )
...
flaky, kinda overlap with test_svd_general
2026-05-15 13:10:02 -04:00
chenyu
409bb0c9ad
requires_grad cannot be None ( #16212 )
...
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
chenyu
a612b88abb
better assert when setitem a refed tensor ( #16210 )
...
also decouple from requires_grad
2026-05-14 23:40:29 -04:00
chenyu
a75c14f010
some setitem tests ( #16209 )
2026-05-14 22:36:25 -04:00
Christopher Milan
891a1ae7c2
onnx: remove dtype_fallback ( #15717 )
2026-05-14 22:06:57 -04:00
chenyu
ffa1aac7b1
gradient for STORE/AFTER ala clone ( #16205 )
2026-05-14 20:17:27 -04:00
chenyu
09096ea565
test_gradient_through_clone ( #16203 )
...
backward through clone crashes now
2026-05-14 19:26:47 -04:00
George Hotz
83ec66da34
fix a fastdiv edge case ( #16199 )
2026-05-14 13:12:18 -07:00
George Hotz
3b8cc31759
disable fast idiv by default, it's broken ( #16197 )
...
* disable fast idiv by default, it's broken
* fix fast idiv tests
2026-05-14 11:48:27 -07:00
C T
1b779a9058
add gelu approximate="none" (match pytorch) ( #16162 )
...
* add gelu approximate="none" (match pytorch)
* lint
* pass through onnx Gelu approximate
* type annotate
* explicit math.sqrt
* keep tinygrad's gelu approximate="tanh" default
2026-05-13 18:53:24 -07:00
b1tg
3c806ff406
clean up gguf ( #16160 )
2026-05-12 21:16:10 -07:00
chenyu
38d407fd58
simplify svd more ( #16181 )
...
all the slowness is scheduling
2026-05-12 23:48:22 -04:00
chenyu
32138c2418
svd to mixin ( #16175 )
2026-05-12 22:29:01 -04:00
chenyu
2172363be5
don't use Tensor indexing in svd ( #16174 )
...
prepare mixin, also about 4X faster for 8x8 input
2026-05-12 21:56:19 -04:00
chenyu
420a08c6d1
qr to mixin ( #16173 )
2026-05-12 21:23:25 -04:00
chenyu
bdcdf1f1a1
jittable masked_select and nonzero ( #16170 )
...
* jittable masked_select and nonzero
make jittable with `size=`, matches jax
* COMPILE_ONLY
2026-05-12 16:39:36 -04:00
wozeparrot
a613bcfc6d
allow after on contiguous in spec ( #16169 )
...
* feat: allow after on contiguous
* feat: add test
2026-05-12 13:11:44 -07:00
chenyu
7c3e3fa154
fix empty input for masked_select and nonzero ( #16168 )
2026-05-12 15:36:51 -04:00
chenyu
da3b7e89a4
atol in test_custom_kernel_multi_output_backward_interacting ( #16166 )
2026-05-12 14:42:12 -04:00
chenyu
25583f6dc1
fix cumsum dtype for 0d input ( #16164 )
2026-05-12 14:18:08 -04:00
George Hotz
64c81dfd24
add all codegen stages to spec_tensor ( #16163 )
2026-05-12 10:35:38 -07:00
chenyu
f3e3c3851f
explicit args to Tensor.rand ( #16161 )
...
added requires_grad, other kwargs were silently dropped
2026-05-12 12:53:39 -04:00
nimlgen
e5729935c6
time_call ( #16152 )
...
* time_call
* x
* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a
add Ops.SOURCE test ( #16155 )
...
* simple failing test
* raises
* change
2026-05-12 22:49:32 +09:00
qazal
5cd0494b14
viz: canonicalize ast for schedule to codegen linking ( #16154 )
...
* simple failing test
* always null device
* viz: canonicalize ast for schedule to codegen linking
* SCACHE
2026-05-12 22:40:21 +09:00
chenyu
09fd80fba6
fix randperm and _multi_like drop requires_grad ( #16150 )
2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7
Update the spec in spec.py to match the current state ( #16132 )
...
* start work on specv2
* more spec
* more spec
* fix amd emulator
* more spec
* more
* fix test_uop_graph
* move those
* spec=2
* skip those questionable tests
* ptx fix
* more spec=2
* store
* allow custom function in tensor
* spec 2
* fix beam search for tensor cores
* delete the old specs
* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66
fix wrong kwargs passed into rands ( #16149 )
...
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Christopher Milan
039d84ff02
Revert "onnx: deduplicate simple proto parsers" ( #16148 )
...
This reverts commit 83eaefcd0f .
2026-05-11 21:45:17 -04:00
chenyu
63c1f00b80
disable test_svd_general again ( #16146 )
...
flaky on CI
2026-05-11 19:24:32 -04:00