chenyu
09096ea565
test_gradient_through_clone ( #16203 )
...
backward through clone crashes now
2026-05-14 19:26:47 -04:00
George Hotz
83ec66da34
fix a fastdiv edge case ( #16199 )
2026-05-14 13:12:18 -07:00
George Hotz
3b8cc31759
disable fast idiv by default, it's broken ( #16197 )
...
* disable fast idiv by default, it's broken
* fix fast idiv tests
2026-05-14 11:48:27 -07:00
C T
1b779a9058
add gelu approximate="none" (match pytorch) ( #16162 )
...
* add gelu approximate="none" (match pytorch)
* lint
* pass through onnx Gelu approximate
* type annotate
* explicit math.sqrt
* keep tinygrad's gelu approximate="tanh" default
2026-05-13 18:53:24 -07:00
b1tg
3c806ff406
clean up gguf ( #16160 )
2026-05-12 21:16:10 -07:00
chenyu
38d407fd58
simplify svd more ( #16181 )
...
all the slowness is scheduling
2026-05-12 23:48:22 -04:00
chenyu
32138c2418
svd to mixin ( #16175 )
2026-05-12 22:29:01 -04:00
chenyu
2172363be5
don't use Tensor indexing in svd ( #16174 )
...
prepare mixin, also about 4X faster for 8x8 input
2026-05-12 21:56:19 -04:00
chenyu
420a08c6d1
qr to mixin ( #16173 )
2026-05-12 21:23:25 -04:00
chenyu
bdcdf1f1a1
jittable masked_select and nonzero ( #16170 )
...
* jittable masked_select and nonzero
make jittable with `size=`, matches jax
* COMPILE_ONLY
2026-05-12 16:39:36 -04:00
wozeparrot
a613bcfc6d
allow after on contiguous in spec ( #16169 )
...
* feat: allow after on contiguous
* feat: add test
2026-05-12 13:11:44 -07:00
chenyu
7c3e3fa154
fix empty input for masked_select and nonzero ( #16168 )
2026-05-12 15:36:51 -04:00
chenyu
da3b7e89a4
atol in test_custom_kernel_multi_output_backward_interacting ( #16166 )
2026-05-12 14:42:12 -04:00
chenyu
25583f6dc1
fix cumsum dtype for 0d input ( #16164 )
2026-05-12 14:18:08 -04:00
George Hotz
64c81dfd24
add all codegen stages to spec_tensor ( #16163 )
2026-05-12 10:35:38 -07:00
chenyu
f3e3c3851f
explicit args to Tensor.rand ( #16161 )
...
added requires_grad, other kwargs were silently dropped
2026-05-12 12:53:39 -04:00
nimlgen
e5729935c6
time_call ( #16152 )
...
* time_call
* x
* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a
add Ops.SOURCE test ( #16155 )
...
* simple failing test
* raises
* change
2026-05-12 22:49:32 +09:00
qazal
5cd0494b14
viz: canonicalize ast for schedule to codegen linking ( #16154 )
...
* simple failing test
* always null device
* viz: canonicalize ast for schedule to codegen linking
* SCACHE
2026-05-12 22:40:21 +09:00
chenyu
09fd80fba6
fix randperm and _multi_like drop requires_grad ( #16150 )
2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7
Update the spec in spec.py to match the current state ( #16132 )
...
* start work on specv2
* more spec
* more spec
* fix amd emulator
* more spec
* more
* fix test_uop_graph
* move those
* spec=2
* skip those questionable tests
* ptx fix
* more spec=2
* store
* allow custom function in tensor
* spec 2
* fix beam search for tensor cores
* delete the old specs
* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66
fix wrong kwargs passed into rands ( #16149 )
...
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Christopher Milan
039d84ff02
Revert "onnx: deduplicate simple proto parsers" ( #16148 )
...
This reverts commit 83eaefcd0f .
2026-05-11 21:45:17 -04:00
chenyu
63c1f00b80
disable test_svd_general again ( #16146 )
...
flaky on CI
2026-05-11 19:24:32 -04:00
chenyu
0b02fb6797
Revert "[pr] match torch rmsnorm ( #16122 )" ( #16144 )
...
This reverts commit 692257dd70 .
2026-05-11 17:53:42 -04:00
chenyu
fbe8be0b8b
style cleanup to Tensor.qr and svd ( #16142 )
...
* style cleanup to Tensor.qr and svd
same kernels
* more
* enable
2026-05-11 17:16:59 -04:00
Joshua James Venter
692257dd70
[pr] match torch rmsnorm ( #16122 )
...
* [pr] match rmsnorm torch
Signed-off-by: Joshua James Venter <venter.joshua@gmail.com >
* 1e-5
* ops.md
---------
Signed-off-by: Joshua James Venter <venter.joshua@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-05-11 14:36:41 -04:00
nimlgen
70c2480e71
hcq2 to extra ( #16126 )
...
* hcq2 in extra
* correct
* some revert from non-extra
* cln
* cpu
* x
* attach
* min
* remove attach
* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c
get_buf() for Buffer ( #16134 )
...
* p
* mypy
* x
2026-05-11 16:36:14 +03:00
qazal
2dd84416bf
viz/cli: schedule renderer ( #16101 )
...
* simpler steps
* work
* work
* iterate
* faster
* better
* simplify more
* sys stdin
* less
* work
* work and mv
* better
* seen bufs
* all call graphs
* print query
* ux
* param to buffer / buffer_view
* work
* respect NO_COLOR in uop_to_json
* less
* render uops
* rm custom renderer
* call can't pyrender.
* unrelated diff
* assert
* 5
2026-05-11 01:56:16 +09:00
George Hotz
daed602569
rename BUFFERIZE to STAGE ( #16125 )
2026-05-10 09:26:46 -07:00
qazal
39ce780907
viz/cli: emit all runs of selected kernel, json fixes ( #16124 )
...
* keep print
* --json in tests, sqtt --json err
* work
* import
* less
* line
2026-05-10 21:45:51 +09:00
qazal
51c7dafb0d
split viz cli test helpers ( #16123 )
2026-05-10 19:42:24 +09:00
Pawan
4dd6ad3514
gradient: add TRUNC backward ( #15925 )
...
* gradient: add TRUNC backward
* test: move round quantization gradient to test_ops
2026-05-08 16:27:55 -07:00
chenyu
235044c9d8
Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD ( #16093 )
...
* Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD
* ruff
2026-05-07 23:18:15 -04:00
June
83eaefcd0f
onnx: deduplicate simple proto parsers ( #16085 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-05-07 18:44:27 -07:00
wozeparrot
d11f4d0ec2
fix: don't copy on slice of DP weight ( #16089 )
2026-05-07 17:58:01 -07:00
George Hotz
b796bbae87
fix valid in indexing tests ( #16087 )
2026-05-07 14:11:28 -07:00
wozeparrot
4d1a9dca41
fix: don't copy precompiled custom kernel outputs ( #16084 )
2026-05-07 14:02:38 -07:00
chenyu
072db9924c
div to mixin ( #16078 )
...
also deleted idiv method
2026-05-07 12:52:37 -04:00
chenyu
516b00e286
mod and fmod to mixin ( #16077 )
2026-05-07 12:13:39 -04:00
chenyu
ef085304bc
stronger divmod_recombine ( #16066 )
2026-05-06 15:41:54 -04:00
chenyu
af4140f3be
fix divmod recombine for floordiv ( #16062 )
2026-05-06 14:22:42 -04:00
chenyu
c6ad3d3ac2
better divmod late rewrite ( #16061 )
...
better order
2026-05-06 11:31:48 -04:00
chenyu
aaabe42373
relax fold_divmod_general ( #16058 )
2026-05-05 21:37:56 -04:00
chenyu
869eae6b37
fix double div rewrites ( #16054 )
2026-05-05 19:34:35 -04:00
qazal
795501e1da
fix device in null graph events ( #16053 )
...
* failing test
* fix compute
* fix sdma
2026-05-06 07:44:08 +09:00
chenyu
34fe37d64e
use FLOORDIV and FLOORMOD ( #16048 )
...
* use FLOORDIV and FLOORMOD
also removed CORRECT_DIVMOD_FOLDING
* fix
* Revert "fix"
This reverts commit 86af33b88ef31943c61e67189b072eca4896409a.
* fix
* fix
2026-05-05 18:32:54 -04:00
nimlgen
5fa0016ffc
supports_exec_item -> supports_uop ( #16033 )
2026-05-05 22:41:13 +03:00
chenyu
9c37a0c75d
Ops.FLOORDIV and Ops.FLOORMOD ( #16038 )
...
* Ops.FLOORDIV and Ops.FLOORMOD
lowered into IDIV and MOD in get_late_rewrite_patterns
* still need this
* exclude
* like that?
2026-05-05 11:42:14 -04:00