Commit Graph

5694 Commits

Author SHA1 Message Date
chenyu
e1715b3b92 extent jit const error to deviceless inputs (#16276) 2026-05-20 02:02:45 -04:00
chenyu
170b857da9 clean up deviceless const _buffer (#16274)
process on CPU similar to multi
2026-05-19 22:47:45 -04:00
chenyu
188d7ec15e clone can take device (#16271)
useful to materialize const on a specific device
2026-05-19 21:29:27 -04:00
George Hotz
55515747b7 Remove Ops.VCONST (#16267)
* start removing vconst

* remove a lot of vconst

* const folding + strict ordering

* update tests

* spec from minigen

* move that
2026-05-19 16:35:24 -07:00
Christopher Milan
7cdd9cbdeb PYTHONREMU: V_CVT_PK_BF8_F32 saturation (#16268) 2026-05-19 19:29:59 -04:00
Christopher Milan
bb2a51f1ea fix mypy mockgpu and add tinygrad.renderer.isa to packages (#16265) 2026-05-19 16:45:03 -04:00
chenyu
890b731b1e more prerequisuite test changed for deviceless const (#16264) 2026-05-19 15:43:45 -04:00
ttomsa
aa1e59ab97 X86 with Ops.INS (#14873)
* draft

* cleanup test_encodings

* cleanup test_isel

* model flag state and support rematerialization

* woops

* add vbroadcastss instruction

* don't fuse load if used multiple times in src

* add movabs instruction and fix idiv

* fixes

* add x86 backend to tests

* float16 fix

* rm TwoAddress2nd

* add BARRIER

* test windows ci

* yup isel fixes the mask stuff too and its beautiful

* add cmoves to the spec

* support storing imms

* no TUPLE_ORDER, breaks tests

* fix remaining seg faults

* add float max

* always fuse index

* minor

* fix DEFINE_VAR/SPECIAL and enable multithreading

* linter

* more linter

* more

* more

* more

* let's try this

* perhaps

* start new scheduler

* more scheduling info

* cleaner shuffle functions

* fixup isel tests

* skip bounds check when NOOPs exist

* skip inf rewrite tests

* fix const tag hack and add x86ops to _shape

* fix

* skip a few tests

* func arg order independent from op value

* x86 goes in own linearize

* switch to PARAM

* more

* add min x86op and neg in decomps

* do mulacc in isel

* use def_reg in test_encodings

* enable emulated int64 tests

* how much does this fix

* Ops becomes OpType

* fix

* rm noqa

* rm machine scheduler stuff

* and this

* allow for extending enums and move X86Ops out of uop

* fix imports

* rm X86GroupOp from ops.py

* spacing

* tell mypy to shut up

* more linter

* add x86op test

* allow set[X86Ops] in upat

* move NOOPs to pre_isel_matcher and rm NOOP from spec

* more asserts

* also this

* cleanup encode

* simplify live range

* fix idiv

* add Ops.INS to x86

* more changes

* more changes

* more changes

* fix

* fix

* fix

* fix

* print formatted assembly

* fix 8bit idiv?

* oops

* enable float16  and unaligned vector load/store

* actually no

* move x86 tests

* no more bool cast

* fix

* linter

* linter

* move X86Ops to x86.py

* fix vpbroadcast

* cleanups

* linter

* print correct reg names

* canonical max

* move max/min and add test

* support float16 vector load/store

* rm bad rewrite

* vpsrldq can't access memory

* regalloc takes renderer

* enable vector load/store on all dtypes

* more isel tests

* rm this for now

* a lot better

* fix

* fix

* fix

* deal with flags correctly

* fix

* enable gep noop rule

* fix

* fix

* fix

* add callee saved registers

* use Ops.CONST instead of X86Ops.IMM

* fix

* enable TUPLE_ORDER

* fix

* rm x86 code in linearizer

* fix

* fix

* fix

* move isa rewrites to codegen

* fix

* fix

* skip test_linearizer.py

* skip more tests

* fix

* fix for idiv/mod changes

* fix

* don't use fmadd if it duplicates fused op

* hacky

* fix

* cleanups

* cleanups

* fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-19 12:42:54 -07:00
Sachith Shetty
74567c1958 fix: pass input device to ONNX helper internal tensors (#16242)
* fix: pass input device to onnx methods internal tensors

* test: onnx helper internal tensors use input device
2026-05-19 11:16:33 -07:00
Christopher Milan
a178301dbe PYTHONREMU: fix CDNA VOP3 conditional writes (#16258) 2026-05-19 13:31:31 -04:00
George Hotz
a120709671 tighten shape spec for broadcasting (#16206)
* tighten shape spec for broadcasting

* use IndexError, not ValueError

* needs size
2026-05-18 22:12:04 -07:00
George Hotz
3f2d401464 all tests pass with NOOPT=1 (#16257)
* all tests pass with NOOPT=1

* fix a few more

* noopt 100% pass

* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
e694d7f222 more deviceless const prerequisites [pr] (#16256)
* more deviceless const prerequisites [pr]

* remove that

* arange.contiguous -> arange.clone in tests

arange will become deviceless const soon, update tests where it needs to be a buffer
2026-05-18 23:14:12 -04:00
chenyu
c1076ed56c Tensor.device and UOp.device can be None (#16255) 2026-05-18 22:08:10 -04:00
chenyu
d532b4f533 multi alu with deviceless const (#16251) 2026-05-18 19:31:53 -04:00
Christopher Milan
7515824a6d ci: actually use clang-20, enable bfloat16 (#16249) 2026-05-18 19:06:43 -04:00
chenyu
754344087a assign for deviceless const source (#16248) 2026-05-18 17:39:53 -04:00
chenyu
73e6b4963b to and shard is noop for deviceless uop (#16247) 2026-05-18 16:11:10 -04:00
chenyu
db639ebe3e deviceless const from UOp (#16243) 2026-05-18 14:14:12 -04:00
chenyu
5ae4dbd599 make slow tests faster (#16244) 2026-05-18 11:42:02 -04:00
chenyu
dcee90aa3f remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
chenyu
8631b6f17d remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050 do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
qazal
ebcb7b7cc0 fp8 gemm tests with scale args (#16231)
* update atol

* update fp8 path

* more work

* update profile.sh
2026-05-16 20:47:58 +09:00
wozeparrot
2d48d7ab09 remove more invalid (#16227) 2026-05-16 02:52:27 -07:00
Christopher Milan
79c0ae5b89 metal: arch is GPU family (#16223) 2026-05-15 21:22:48 -04:00
chenyu
d62c1d83c0 remove Tensor.eye override (#16219)
* remove Tensor.eye override

was only needed for requires_grad arg

* README
2026-05-15 15:40:34 -04:00
chenyu
07a172dbbb remove noop requires_grad_ calls (#16213) 2026-05-15 13:31:10 -04:00
chenyu
c6cf9e8f0c remove test_svd_nonfull_5_5 (#16217)
flaky, kinda overlap with test_svd_general
2026-05-15 13:10:02 -04:00
chenyu
409bb0c9ad requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
chenyu
a612b88abb better assert when setitem a refed tensor (#16210)
also decouple from requires_grad
2026-05-14 23:40:29 -04:00
chenyu
a75c14f010 some setitem tests (#16209) 2026-05-14 22:36:25 -04:00
Christopher Milan
891a1ae7c2 onnx: remove dtype_fallback (#15717) 2026-05-14 22:06:57 -04:00
chenyu
ffa1aac7b1 gradient for STORE/AFTER ala clone (#16205) 2026-05-14 20:17:27 -04:00
chenyu
09096ea565 test_gradient_through_clone (#16203)
backward through clone crashes now
2026-05-14 19:26:47 -04:00
George Hotz
83ec66da34 fix a fastdiv edge case (#16199) 2026-05-14 13:12:18 -07:00
George Hotz
3b8cc31759 disable fast idiv by default, it's broken (#16197)
* disable fast idiv by default, it's broken

* fix fast idiv tests
2026-05-14 11:48:27 -07:00
C T
1b779a9058 add gelu approximate="none" (match pytorch) (#16162)
* add gelu approximate="none" (match pytorch)

* lint

* pass through onnx Gelu approximate

* type annotate

* explicit math.sqrt

* keep tinygrad's gelu approximate="tanh" default
2026-05-13 18:53:24 -07:00
b1tg
3c806ff406 clean up gguf (#16160) 2026-05-12 21:16:10 -07:00
chenyu
38d407fd58 simplify svd more (#16181)
all the slowness is scheduling
2026-05-12 23:48:22 -04:00
chenyu
32138c2418 svd to mixin (#16175) 2026-05-12 22:29:01 -04:00
chenyu
2172363be5 don't use Tensor indexing in svd (#16174)
prepare mixin, also about 4X faster for 8x8 input
2026-05-12 21:56:19 -04:00
chenyu
420a08c6d1 qr to mixin (#16173) 2026-05-12 21:23:25 -04:00
chenyu
bdcdf1f1a1 jittable masked_select and nonzero (#16170)
* jittable masked_select and nonzero

make jittable with `size=`, matches jax

* COMPILE_ONLY
2026-05-12 16:39:36 -04:00
wozeparrot
a613bcfc6d allow after on contiguous in spec (#16169)
* feat: allow after on contiguous

* feat: add test
2026-05-12 13:11:44 -07:00
chenyu
7c3e3fa154 fix empty input for masked_select and nonzero (#16168) 2026-05-12 15:36:51 -04:00
chenyu
da3b7e89a4 atol in test_custom_kernel_multi_output_backward_interacting (#16166) 2026-05-12 14:42:12 -04:00
chenyu
25583f6dc1 fix cumsum dtype for 0d input (#16164) 2026-05-12 14:18:08 -04:00
George Hotz
64c81dfd24 add all codegen stages to spec_tensor (#16163) 2026-05-12 10:35:38 -07:00
chenyu
f3e3c3851f explicit args to Tensor.rand (#16161)
added requires_grad, other kwargs were silently dropped
2026-05-12 12:53:39 -04:00