Commit Graph

1235 Commits

Author SHA1 Message Date
qazal
b2e95b2db3 rangeify: no copies for write+read of same slice (#16585)
* failing test

* cleaner failing tests

* assign and read of same slice shouldn't create copies

* err in the changes

* shrink with no overlapping regions in dest is fine
2026-06-13 02:19:47 +09:00
Philip Sinitsin
76c10cd635 jit: don't memplan buffers reachable from live tensors (#16588)
The memory planner was suballocating BUFFERs created during JIT capture that are still referenced by external lazy tensor graphs, like the .grad tensors assigned by backward(). The replay then only writes the arena slices, so realizing such a tensor after the call reads freshly allocated memory and silently returns zeros. Hold every BUFFER reachable from a live Tensor instead of only the parameters of the return value; true internals are still planned. Fixes #16571.
2026-06-12 17:51:54 +03:00
qazal
a83710396c support mselect input to CALL, less kernels in allreduce (#16567)
* support mselect input to CALL, less kernels in allreduce

* resolve mstack
2026-06-11 18:10:47 +09:00
qazal
21f1101691 add allreduce kernel count test (#16566) 2026-06-11 15:54:12 +09:00
George Hotz
7e6d617935 addrspace cleanups (#16565)
* addrspace cleanups

* bumps

* eh, relax a little
2026-06-10 15:57:18 -07:00
qazal
34481830f1 rangeify: fix cost function for AFTER(out, CALL) (#16559)
* simple failing test

* fix rangeify cost function

* new ops count
2026-06-10 17:30:50 +09:00
qazal
d18ad49f20 fix flaky test_disktensor (#16549) 2026-06-09 18:23:22 +09:00
qazal
fa400f9790 less E kernels in all2all (#16546) 2026-06-09 13:51:57 +09:00
qazal
b8931440ae add all2all schedule test (#16545) 2026-06-09 12:41:35 +09:00
Christopher Milan
8c0ba1da5c cleanup more from test/backend (#16521) 2026-06-05 18:38:46 -04:00
Christopher Milan
9b0f75622c many jit tests belong in unit (#16508) 2026-06-04 21:36:53 -04:00
chenyu
bb407d8b3c fix transform_precompiled_call for MULTI (#16510)
based on my understanding for https://github.com/tinygrad/tinygrad/pull/16084
2026-06-04 20:09:58 -04:00
chenyu
4a8bf07a87 remove CONST(DEVICE) (#16506) 2026-06-04 11:29:46 -04:00
chenyu
8a4203638a make full with buffer=False deviceless (#16483)
affects arange and eye
2026-06-03 12:35:59 -04:00
chenyu
b7280705a7 limit CONST(UNIQUE) to invalids only (#16432) 2026-05-29 16:02:06 -04:00
chenyu
d72d8ee065 .const() should not ignore dtype (#16412)
fixed a bug in postrange, also cleaner
2026-05-28 10:49:15 -04:00
chenyu
6da785562b test_custom_kernel_precompile_multidevice (#16401)
add a test to show what invalids need
2026-05-27 11:19:16 -04:00
chenyu
945ed4f689 revert const unique changes (#16395) 2026-05-27 00:06:41 -04:00
chenyu
fa14cde05c test update for arange and eye (#16394)
these will need explicit clone to make a buffer
2026-05-26 22:48:34 -04:00
George Hotz
156a4438d9 rename BUFFER_VIEW to SLICE (#16391)
* rename BUFFER_VIEW to SLICE

* fix comments
2026-05-26 18:15:00 -07:00
chenyu
0b88827482 remove CONST(UNIQUE) (#16383) 2026-05-26 14:45:22 -04:00
chenyu
d861c50dce remove unique_const (#16382) 2026-05-26 13:53:31 -04:00
chenyu
9b00defc8c Revert "remove unique_const (#16372)" (#16380)
This reverts commit 09019d6761.
2026-05-26 12:30:07 -04:00
chenyu
09019d6761 remove unique_const (#16372)
* remove unique_const

* fix SDWA thing

* that?
2026-05-26 12:18:03 -04:00
George Hotz
7f1b02854e bufferview offset is units of input dtype (#16378) 2026-05-26 08:49:31 -07:00
Christopher Milan
8ddd1328df remove getenv(CI) (#16365)
gone everywhere except test_interop, because torch MPS does not work in actions
2026-05-25 20:23:33 -04:00
George Hotz
689ab6a49f move buffer view offset to src (#16364)
* this work?

* failed
2026-05-25 17:07:55 -07:00
chenyu
149a87dac2 deviceless const cleanups (#16341) 2026-05-22 20:11:01 -04:00
Christopher Milan
451f38155c start cleanup of the slowest tests (#16339) 2026-05-22 18:39:36 -04:00
chenyu
3115952266 more unique const removal prerequisite (#16328) 2026-05-21 23:51:40 -04:00
Christopher Milan
c2d06570a5 remove getenv(CI) from core tinygrad (#16326) 2026-05-21 22:20:33 -04:00
Christopher Milan
150a82de1f start cleaning up dtype tests (#16324) 2026-05-21 21:11:49 -04:00
chenyu
73ea36f4ac full(buffer=True) (#16311)
make full a buffer with flag to turn off
2026-05-21 16:34:44 -04:00
Christopher Milan
172f9493e1 move is_dtype_supported to renderer (#16226) 2026-05-20 21:19:37 -04:00
chenyu
4dbe6a2ee7 remove _force_unique from Tensor init (#16277) 2026-05-20 14:13:05 -04:00
chenyu
890b731b1e more prerequisuite test changed for deviceless const (#16264) 2026-05-19 15:43:45 -04:00
ttomsa
aa1e59ab97 X86 with Ops.INS (#14873)
* draft

* cleanup test_encodings

* cleanup test_isel

* model flag state and support rematerialization

* woops

* add vbroadcastss instruction

* don't fuse load if used multiple times in src

* add movabs instruction and fix idiv

* fixes

* add x86 backend to tests

* float16 fix

* rm TwoAddress2nd

* add BARRIER

* test windows ci

* yup isel fixes the mask stuff too and its beautiful

* add cmoves to the spec

* support storing imms

* no TUPLE_ORDER, breaks tests

* fix remaining seg faults

* add float max

* always fuse index

* minor

* fix DEFINE_VAR/SPECIAL and enable multithreading

* linter

* more linter

* more

* more

* more

* let's try this

* perhaps

* start new scheduler

* more scheduling info

* cleaner shuffle functions

* fixup isel tests

* skip bounds check when NOOPs exist

* skip inf rewrite tests

* fix const tag hack and add x86ops to _shape

* fix

* skip a few tests

* func arg order independent from op value

* x86 goes in own linearize

* switch to PARAM

* more

* add min x86op and neg in decomps

* do mulacc in isel

* use def_reg in test_encodings

* enable emulated int64 tests

* how much does this fix

* Ops becomes OpType

* fix

* rm noqa

* rm machine scheduler stuff

* and this

* allow for extending enums and move X86Ops out of uop

* fix imports

* rm X86GroupOp from ops.py

* spacing

* tell mypy to shut up

* more linter

* add x86op test

* allow set[X86Ops] in upat

* move NOOPs to pre_isel_matcher and rm NOOP from spec

* more asserts

* also this

* cleanup encode

* simplify live range

* fix idiv

* add Ops.INS to x86

* more changes

* more changes

* more changes

* fix

* fix

* fix

* fix

* print formatted assembly

* fix 8bit idiv?

* oops

* enable float16  and unaligned vector load/store

* actually no

* move x86 tests

* no more bool cast

* fix

* linter

* linter

* move X86Ops to x86.py

* fix vpbroadcast

* cleanups

* linter

* print correct reg names

* canonical max

* move max/min and add test

* support float16 vector load/store

* rm bad rewrite

* vpsrldq can't access memory

* regalloc takes renderer

* enable vector load/store on all dtypes

* more isel tests

* rm this for now

* a lot better

* fix

* fix

* fix

* deal with flags correctly

* fix

* enable gep noop rule

* fix

* fix

* fix

* add callee saved registers

* use Ops.CONST instead of X86Ops.IMM

* fix

* enable TUPLE_ORDER

* fix

* rm x86 code in linearizer

* fix

* fix

* fix

* move isa rewrites to codegen

* fix

* fix

* skip test_linearizer.py

* skip more tests

* fix

* fix for idiv/mod changes

* fix

* don't use fmadd if it duplicates fused op

* hacky

* fix

* cleanups

* cleanups

* fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-19 12:42:54 -07:00
George Hotz
3f2d401464 all tests pass with NOOPT=1 (#16257)
* all tests pass with NOOPT=1

* fix a few more

* noopt 100% pass

* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
754344087a assign for deviceless const source (#16248) 2026-05-18 17:39:53 -04:00
chenyu
dcee90aa3f remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
chenyu
8631b6f17d remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050 do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
chenyu
07a172dbbb remove noop requires_grad_ calls (#16213) 2026-05-15 13:31:10 -04:00
chenyu
c6cf9e8f0c remove test_svd_nonfull_5_5 (#16217)
flaky, kinda overlap with test_svd_general
2026-05-15 13:10:02 -04:00
chenyu
409bb0c9ad requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
chenyu
a75c14f010 some setitem tests (#16209) 2026-05-14 22:36:25 -04:00
chenyu
ffa1aac7b1 gradient for STORE/AFTER ala clone (#16205) 2026-05-14 20:17:27 -04:00
chenyu
09096ea565 test_gradient_through_clone (#16203)
backward through clone crashes now
2026-05-14 19:26:47 -04:00
b1tg
3c806ff406 clean up gguf (#16160) 2026-05-12 21:16:10 -07:00
chenyu
38d407fd58 simplify svd more (#16181)
all the slowness is scheduling
2026-05-12 23:48:22 -04:00