Commit Graph

1221 Commits

Author SHA1 Message Date
chenyu
b7280705a7 limit CONST(UNIQUE) to invalids only (#16432) 2026-05-29 16:02:06 -04:00
chenyu
d72d8ee065 .const() should not ignore dtype (#16412)
fixed a bug in postrange, also cleaner
2026-05-28 10:49:15 -04:00
chenyu
6da785562b test_custom_kernel_precompile_multidevice (#16401)
add a test to show what invalids need
2026-05-27 11:19:16 -04:00
chenyu
945ed4f689 revert const unique changes (#16395) 2026-05-27 00:06:41 -04:00
chenyu
fa14cde05c test update for arange and eye (#16394)
these will need explicit clone to make a buffer
2026-05-26 22:48:34 -04:00
George Hotz
156a4438d9 rename BUFFER_VIEW to SLICE (#16391)
* rename BUFFER_VIEW to SLICE

* fix comments
2026-05-26 18:15:00 -07:00
chenyu
0b88827482 remove CONST(UNIQUE) (#16383) 2026-05-26 14:45:22 -04:00
chenyu
d861c50dce remove unique_const (#16382) 2026-05-26 13:53:31 -04:00
chenyu
9b00defc8c Revert "remove unique_const (#16372)" (#16380)
This reverts commit 09019d6761.
2026-05-26 12:30:07 -04:00
chenyu
09019d6761 remove unique_const (#16372)
* remove unique_const

* fix SDWA thing

* that?
2026-05-26 12:18:03 -04:00
George Hotz
7f1b02854e bufferview offset is units of input dtype (#16378) 2026-05-26 08:49:31 -07:00
Christopher Milan
8ddd1328df remove getenv(CI) (#16365)
gone everywhere except test_interop, because torch MPS does not work in actions
2026-05-25 20:23:33 -04:00
George Hotz
689ab6a49f move buffer view offset to src (#16364)
* this work?

* failed
2026-05-25 17:07:55 -07:00
chenyu
149a87dac2 deviceless const cleanups (#16341) 2026-05-22 20:11:01 -04:00
Christopher Milan
451f38155c start cleanup of the slowest tests (#16339) 2026-05-22 18:39:36 -04:00
chenyu
3115952266 more unique const removal prerequisite (#16328) 2026-05-21 23:51:40 -04:00
Christopher Milan
c2d06570a5 remove getenv(CI) from core tinygrad (#16326) 2026-05-21 22:20:33 -04:00
Christopher Milan
150a82de1f start cleaning up dtype tests (#16324) 2026-05-21 21:11:49 -04:00
chenyu
73ea36f4ac full(buffer=True) (#16311)
make full a buffer with flag to turn off
2026-05-21 16:34:44 -04:00
Christopher Milan
172f9493e1 move is_dtype_supported to renderer (#16226) 2026-05-20 21:19:37 -04:00
chenyu
4dbe6a2ee7 remove _force_unique from Tensor init (#16277) 2026-05-20 14:13:05 -04:00
chenyu
890b731b1e more prerequisuite test changed for deviceless const (#16264) 2026-05-19 15:43:45 -04:00
ttomsa
aa1e59ab97 X86 with Ops.INS (#14873)
* draft

* cleanup test_encodings

* cleanup test_isel

* model flag state and support rematerialization

* woops

* add vbroadcastss instruction

* don't fuse load if used multiple times in src

* add movabs instruction and fix idiv

* fixes

* add x86 backend to tests

* float16 fix

* rm TwoAddress2nd

* add BARRIER

* test windows ci

* yup isel fixes the mask stuff too and its beautiful

* add cmoves to the spec

* support storing imms

* no TUPLE_ORDER, breaks tests

* fix remaining seg faults

* add float max

* always fuse index

* minor

* fix DEFINE_VAR/SPECIAL and enable multithreading

* linter

* more linter

* more

* more

* more

* let's try this

* perhaps

* start new scheduler

* more scheduling info

* cleaner shuffle functions

* fixup isel tests

* skip bounds check when NOOPs exist

* skip inf rewrite tests

* fix const tag hack and add x86ops to _shape

* fix

* skip a few tests

* func arg order independent from op value

* x86 goes in own linearize

* switch to PARAM

* more

* add min x86op and neg in decomps

* do mulacc in isel

* use def_reg in test_encodings

* enable emulated int64 tests

* how much does this fix

* Ops becomes OpType

* fix

* rm noqa

* rm machine scheduler stuff

* and this

* allow for extending enums and move X86Ops out of uop

* fix imports

* rm X86GroupOp from ops.py

* spacing

* tell mypy to shut up

* more linter

* add x86op test

* allow set[X86Ops] in upat

* move NOOPs to pre_isel_matcher and rm NOOP from spec

* more asserts

* also this

* cleanup encode

* simplify live range

* fix idiv

* add Ops.INS to x86

* more changes

* more changes

* more changes

* fix

* fix

* fix

* fix

* print formatted assembly

* fix 8bit idiv?

* oops

* enable float16  and unaligned vector load/store

* actually no

* move x86 tests

* no more bool cast

* fix

* linter

* linter

* move X86Ops to x86.py

* fix vpbroadcast

* cleanups

* linter

* print correct reg names

* canonical max

* move max/min and add test

* support float16 vector load/store

* rm bad rewrite

* vpsrldq can't access memory

* regalloc takes renderer

* enable vector load/store on all dtypes

* more isel tests

* rm this for now

* a lot better

* fix

* fix

* fix

* deal with flags correctly

* fix

* enable gep noop rule

* fix

* fix

* fix

* add callee saved registers

* use Ops.CONST instead of X86Ops.IMM

* fix

* enable TUPLE_ORDER

* fix

* rm x86 code in linearizer

* fix

* fix

* fix

* move isa rewrites to codegen

* fix

* fix

* skip test_linearizer.py

* skip more tests

* fix

* fix for idiv/mod changes

* fix

* don't use fmadd if it duplicates fused op

* hacky

* fix

* cleanups

* cleanups

* fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-19 12:42:54 -07:00
George Hotz
3f2d401464 all tests pass with NOOPT=1 (#16257)
* all tests pass with NOOPT=1

* fix a few more

* noopt 100% pass

* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
754344087a assign for deviceless const source (#16248) 2026-05-18 17:39:53 -04:00
chenyu
dcee90aa3f remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
chenyu
8631b6f17d remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050 do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
chenyu
07a172dbbb remove noop requires_grad_ calls (#16213) 2026-05-15 13:31:10 -04:00
chenyu
c6cf9e8f0c remove test_svd_nonfull_5_5 (#16217)
flaky, kinda overlap with test_svd_general
2026-05-15 13:10:02 -04:00
chenyu
409bb0c9ad requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
chenyu
a75c14f010 some setitem tests (#16209) 2026-05-14 22:36:25 -04:00
chenyu
ffa1aac7b1 gradient for STORE/AFTER ala clone (#16205) 2026-05-14 20:17:27 -04:00
chenyu
09096ea565 test_gradient_through_clone (#16203)
backward through clone crashes now
2026-05-14 19:26:47 -04:00
b1tg
3c806ff406 clean up gguf (#16160) 2026-05-12 21:16:10 -07:00
chenyu
38d407fd58 simplify svd more (#16181)
all the slowness is scheduling
2026-05-12 23:48:22 -04:00
chenyu
2172363be5 don't use Tensor indexing in svd (#16174)
prepare mixin, also about 4X faster for 8x8 input
2026-05-12 21:56:19 -04:00
wozeparrot
a613bcfc6d allow after on contiguous in spec (#16169)
* feat: allow after on contiguous

* feat: add test
2026-05-12 13:11:44 -07:00
chenyu
da3b7e89a4 atol in test_custom_kernel_multi_output_backward_interacting (#16166) 2026-05-12 14:42:12 -04:00
George Hotz
8294d105a7 Update the spec in spec.py to match the current state (#16132)
* start work on specv2

* more spec

* more spec

* fix amd emulator

* more spec

* more

* fix test_uop_graph

* move those

* spec=2

* skip those questionable tests

* ptx fix

* more spec=2

* store

* allow custom function in tensor

* spec 2

* fix beam search for tensor cores

* delete the old specs

* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66 fix wrong kwargs passed into rands (#16149)
working towards explicit args for these
2026-05-11 22:22:06 -04:00
chenyu
63c1f00b80 disable test_svd_general again (#16146)
flaky on CI
2026-05-11 19:24:32 -04:00
chenyu
fbe8be0b8b style cleanup to Tensor.qr and svd (#16142)
* style cleanup to Tensor.qr and svd

same kernels

* more

* enable
2026-05-11 17:16:59 -04:00
wozeparrot
4d1a9dca41 fix: don't copy precompiled custom kernel outputs (#16084) 2026-05-07 14:02:38 -07:00
nimlgen
5fa0016ffc supports_exec_item -> supports_uop (#16033) 2026-05-05 22:41:13 +03:00
wozeparrot
419d525553 feat: handle multioutput kernel grads (#16028) 2026-05-02 22:31:45 -07:00
George Hotz
5f441ecffc unify reduce + reduce_axis (#15973)
* unify reduce + reduce_axis

* fix all tests

* lil cleanups
2026-04-29 10:29:56 -07:00
nimlgen
4164666c72 programinfo (#15942)
* programinfo

* fix

* m

* x

* x

* changes

* x

* fix

* rm
2026-04-27 23:12:03 +03:00
nimlgen
96165ff0d1 validate_with_cpu as rewrite (#15938)
* validate_with_cpu as rewrite

* compil

* x

* linter

* moved

* fix
2026-04-26 19:58:53 +03:00
nimlgen
d3378010ee schedule() -> schedule_linear() in tests (batch 1) (#15915)
* schedule_with_vars -> linear_with_vars in tests

* tests batch 1

* batch 2

* estimate_uop

* simpler

* rm
2026-04-24 23:40:53 +03:00