George Hotz
770dac0e0d
broadcast
2026-05-14 17:04:37 -07:00
George Hotz
b827858479
broadcast shape
2026-05-14 17:01:20 -07:00
chenyu
09096ea565
test_gradient_through_clone ( #16203 )
...
backward through clone crashes now
2026-05-14 19:26:47 -04:00
George Hotz
d4dcd8487b
aggressive shape check to prepare for broadcasting ( #16202 )
...
* add implicit broadcasting to shape
* NOOP/ALLREDUCE fixes
2026-05-14 16:15:44 -07:00
George Hotz
83ec66da34
fix a fastdiv edge case ( #16199 )
2026-05-14 13:12:18 -07:00
nimlgen
62ea73719d
hcq2: share more with graph ( #16196 )
...
* share more with graph
* comment
2026-05-14 22:28:11 +03:00
George Hotz
3b8cc31759
disable fast idiv by default, it's broken ( #16197 )
...
* disable fast idiv by default, it's broken
* fix fast idiv tests
2026-05-14 11:48:27 -07:00
Christopher Milan
8f811649ff
better compiler_cpu invalid arch errors ( #16194 )
2026-05-14 14:36:14 -04:00
qazal
f03a7fd6d1
viz/cli: readable uop json ( #16195 )
...
* viz/cli: readable uop json repr
* work
* better
2026-05-14 21:33:10 +09:00
C T
1b779a9058
add gelu approximate="none" (match pytorch) ( #16162 )
...
* add gelu approximate="none" (match pytorch)
* lint
* pass through onnx Gelu approximate
* type annotate
* explicit math.sqrt
* keep tinygrad's gelu approximate="tanh" default
2026-05-13 18:53:24 -07:00
chenyu
dd9187d9ee
minor hash cleanups ( #16190 )
...
same kernels
2026-05-13 20:59:24 -04:00
wozeparrot
88ac2ac1fd
llama: cleanups ( #16189 )
2026-05-13 17:08:06 -07:00
Christopher Milan
9a365d9978
ci: fix null image tests ( #16188 )
2026-05-13 18:00:05 -04:00
nimlgen
ad1fb7c981
hcq2: graph ( #16186 )
...
* keep this for now
* early graph
2026-05-13 22:49:43 +03:00
chenyu
3f9f6a51b2
minor image_conv2d cleanup ( #16187 )
...
remove some no-op slices
2026-05-13 15:47:40 -04:00
b1tg
59c34b9fe0
llm: precise device ( #16159 )
...
* llm: precise device
* llm: pass device to precompute_freqs_cis
2026-05-12 21:16:42 -07:00
b1tg
3c806ff406
clean up gguf ( #16160 )
2026-05-12 21:16:10 -07:00
wozeparrot
e97f2c1114
llama: only gemm + fa custom kernel ( #16180 )
...
* llama: tie store to grad directly
* llama: set mp flags
* llama: non fused grad fp8 quantize path
2026-05-12 21:03:49 -07:00
chenyu
38d407fd58
simplify svd more ( #16181 )
...
all the slowness is scheduling
2026-05-12 23:48:22 -04:00
Christopher Milan
f1fdd2ccec
ci: add IMAGE=1 compile-only tests ( #16182 )
...
* ci: add IMAGE=1 compile-only tests
* fix
2026-05-12 23:40:32 -04:00
George Hotz
faf7fb7513
update nir renderer for new image style ( #16179 )
...
* update nir renderer for new image style
* don't cast image indexes
2026-05-12 20:25:01 -07:00
Christopher Milan
7d0c5ab689
ci: ocelot needs nvcc on linux ( #16178 )
...
* ci: ocelot needs nvcc on linux
* cudart
2026-05-12 23:13:48 -04:00
chenyu
32138c2418
svd to mixin ( #16175 )
2026-05-12 22:29:01 -04:00
George Hotz
69e1f3b551
remove vec2 from image in gater ( #16165 )
...
* remove vec2 from image in gater
* only simple idx
* fix python with new image style
* fix vconst
* just vconst and stack
* cast to int there
* fix for const
* fix process replay
2026-05-12 19:25:52 -07:00
chenyu
2172363be5
don't use Tensor indexing in svd ( #16174 )
...
prepare mixin, also about 4X faster for 8x8 input
2026-05-12 21:56:19 -04:00
chenyu
420a08c6d1
qr to mixin ( #16173 )
2026-05-12 21:23:25 -04:00
chenyu
c6a82fe927
functional qr and svd ( #16172 )
...
no clone and setitem, will move to mixin next. slightly faster but still quite slow
2026-05-12 19:12:08 -04:00
Christopher Milan
3844a31f87
ci: untangle cuda/ocelot, less apt ( #16171 )
...
* ci: untangle cuda/ocelot, less apt
* ldconfig
2026-05-12 18:14:03 -04:00
Christopher Milan
316607f004
dsp: don't use docker in ci ( #16167 )
...
* dsp: don't use docker in ci
* add setup script for macos docker
2026-05-12 17:11:03 -04:00
chenyu
bdcdf1f1a1
jittable masked_select and nonzero ( #16170 )
...
* jittable masked_select and nonzero
make jittable with `size=`, matches jax
* COMPILE_ONLY
2026-05-12 16:39:36 -04:00
wozeparrot
a613bcfc6d
allow after on contiguous in spec ( #16169 )
...
* feat: allow after on contiguous
* feat: add test
2026-05-12 13:11:44 -07:00
chenyu
7c3e3fa154
fix empty input for masked_select and nonzero ( #16168 )
2026-05-12 15:36:51 -04:00
chenyu
da3b7e89a4
atol in test_custom_kernel_multi_output_backward_interacting ( #16166 )
2026-05-12 14:42:12 -04:00
chenyu
25583f6dc1
fix cumsum dtype for 0d input ( #16164 )
2026-05-12 14:18:08 -04:00
George Hotz
64c81dfd24
add all codegen stages to spec_tensor ( #16163 )
2026-05-12 10:35:38 -07:00
chenyu
f3e3c3851f
explicit args to Tensor.rand ( #16161 )
...
added requires_grad, other kwargs were silently dropped
2026-05-12 12:53:39 -04:00
nimlgen
e93fb5f9b9
hcq2: remove hcqprogram ( #16157 )
...
* hcq2 rm program
* nonbeauty
* no prog
* tiny
* f
* x
2026-05-12 18:49:13 +03:00
nimlgen
a708542308
fix ci spec ( #16156 )
2026-05-12 17:57:11 +03:00
nimlgen
e5729935c6
time_call ( #16152 )
...
* time_call
* x
* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a
add Ops.SOURCE test ( #16155 )
...
* simple failing test
* raises
* change
2026-05-12 22:49:32 +09:00
qazal
5cd0494b14
viz: canonicalize ast for schedule to codegen linking ( #16154 )
...
* simple failing test
* always null device
* viz: canonicalize ast for schedule to codegen linking
* SCACHE
2026-05-12 22:40:21 +09:00
qazal
c1d125ff3b
llm: add markers to --benchmark ( #16153 )
...
* markers in llm
* ui fix
2026-05-12 20:14:11 +09:00
wozeparrot
e9359d9e7d
more llama mp fixes ( #16151 )
...
* llama: SPLIT_W13
* llama: fix with no fused kernels
* llama: cast to bf16 on non asm_gemm patH
* llama: new mp flags
2026-05-11 21:29:23 -07:00
chenyu
09fd80fba6
fix randperm and _multi_like drop requires_grad ( #16150 )
2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7
Update the spec in spec.py to match the current state ( #16132 )
...
* start work on specv2
* more spec
* more spec
* fix amd emulator
* more spec
* more
* fix test_uop_graph
* move those
* spec=2
* skip those questionable tests
* ptx fix
* more spec=2
* store
* allow custom function in tensor
* spec 2
* fix beam search for tensor cores
* delete the old specs
* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66
fix wrong kwargs passed into rands ( #16149 )
...
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Christopher Milan
039d84ff02
Revert "onnx: deduplicate simple proto parsers" ( #16148 )
...
This reverts commit 83eaefcd0f .
2026-05-11 21:45:17 -04:00
Christopher Milan
20f587d5d5
nv: rm _download ( #16147 )
2026-05-11 19:56:37 -04:00
chenyu
371ab2023f
clean up image_dot and image_conv2d ( #16145 )
2026-05-11 19:37:58 -04:00
Vikram Rangarajan
effa263865
Torch backend aten::cat.out fix ( #16121 )
...
* Handle empty 1D tensors in cat_out
* Undid other changes
* Fixed torch cat
* Improved cat.out, added more tests
* Cleaned code
* Type hinted dim
* Removed whitespace
2026-05-11 16:28:16 -07:00