Commit Graph

226 Commits

Author SHA1 Message Date
Christopher Milan
434cfa96a3 ci: no fetch in backend tests (#16438)
should make for less actions cache thrashing
2026-05-29 17:11:16 -04:00
chenyu
b7280705a7 limit CONST(UNIQUE) to invalids only (#16432) 2026-05-29 16:02:06 -04:00
George Hotz
1e7f1dcf49 add ParamArgs [pr] (#16421)
* add ParamArgs

* fix export

* cleanups

* fixes

* simpler
2026-05-28 19:17:17 -07:00
Christopher Milan
171401e8df skip modulo by zero in test_dtype_alu (#16404) 2026-05-27 17:09:05 -04:00
chenyu
c33b767407 bring back test and torch backend change for unique const (#16403) 2026-05-27 15:16:08 -04:00
chenyu
3e80f375ee skip test_setitem_fancy_on_unrealized_view (#16400)
crashes in linux llvm ci
2026-05-27 09:50:26 -04:00
chenyu
945ed4f689 revert const unique changes (#16395) 2026-05-27 00:06:41 -04:00
chenyu
fa14cde05c test update for arange and eye (#16394)
these will need explicit clone to make a buffer
2026-05-26 22:48:34 -04:00
George Hotz
156a4438d9 rename BUFFER_VIEW to SLICE (#16391)
* rename BUFFER_VIEW to SLICE

* fix comments
2026-05-26 18:15:00 -07:00
Christopher Milan
3adf7f5d95 disable flaky cl test (#16388) 2026-05-26 19:56:57 -04:00
Christopher Milan
d23659d38b cleanup some old test skips (#16384) 2026-05-26 19:07:22 -04:00
chenyu
d861c50dce remove unique_const (#16382) 2026-05-26 13:53:31 -04:00
George Hotz
bac82d4949 fix emu bug in gfx950 (#16381)
* fix emu bug in gfx950

* fix renderer
2026-05-26 10:32:03 -07:00
chenyu
9b00defc8c Revert "remove unique_const (#16372)" (#16380)
This reverts commit 09019d6761.
2026-05-26 12:30:07 -04:00
chenyu
09019d6761 remove unique_const (#16372)
* remove unique_const

* fix SDWA thing

* that?
2026-05-26 12:18:03 -04:00
wozeparrot
76fc39ccc0 gather to single device (#16354) 2026-05-25 17:27:08 -07:00
Christopher Milan
8ddd1328df remove getenv(CI) (#16365)
gone everywhere except test_interop, because torch MPS does not work in actions
2026-05-25 20:23:33 -04:00
Christopher Milan
d8f86be613 webgpu: shader-f16 support in arch (#16370) 2026-05-25 19:20:59 -04:00
chenyu
5d5e02871f remove Tensor.from_uop (#16344)
and no device for const in Tensor init
2026-05-24 18:53:09 -04:00
chenyu
926d125a63 update test_stack (#16345)
also skip COMPILE_ONLY, it was comparing 0==0
2026-05-23 10:42:35 -04:00
chenyu
149a87dac2 deviceless const cleanups (#16341) 2026-05-22 20:11:01 -04:00
Christopher Milan
451f38155c start cleanup of the slowest tests (#16339) 2026-05-22 18:39:36 -04:00
qazal
bbfe4f80ec quantize_fp8 kernels in uops (#16288)
* add tests

* simple UOp kernel is n^2

* fast kernel matching c++, opts_to_apply=()

* remove cpp

* simple o(n) kernel, two passes

* fuse the loops

* works on DEV=CPU

* multi regression test

* fix multi, this can possibly be its own bugfix

* test cleanups

* minimal diff

* match C in UOps

* Revert "match C in UOps"

This reverts commit 0bef740c30.

* edit test

* match speed with C try 2

* needs_second_gpu

* cleanup
2026-05-22 20:54:06 +09:00
chenyu
3115952266 more unique const removal prerequisite (#16328) 2026-05-21 23:51:40 -04:00
Christopher Milan
c2d06570a5 remove getenv(CI) from core tinygrad (#16326) 2026-05-21 22:20:33 -04:00
Christopher Milan
150a82de1f start cleaning up dtype tests (#16324) 2026-05-21 21:11:49 -04:00
chenyu
31424cda71 Tensor.requires_grad -> is_param (#16325)
for optimizer
2026-05-21 19:39:57 -04:00
chenyu
720a27bed8 remove many requires_grad= args (#16321)
* remove many requires_grad= args

* doc and example

* not cifar
2026-05-21 18:37:11 -04:00
Christopher Milan
172f9493e1 move is_dtype_supported to renderer (#16226) 2026-05-20 21:19:37 -04:00
George Hotz
58d58c1659 remove DEVECTORIZE (#16290)
* remove DEVECTORIZE

* fully remove DEVECTORIZE
2026-05-20 13:25:49 -07:00
chenyu
4dbe6a2ee7 remove _force_unique from Tensor init (#16277) 2026-05-20 14:13:05 -04:00
qazal
1e0fffe256 fused ce llama kernel in UOps (#16263)
* work

* using uops

* delete things

* work

* work

* higher level uops

* cleanups
2026-05-20 19:45:28 +09:00
chenyu
e1715b3b92 extent jit const error to deviceless inputs (#16276) 2026-05-20 02:02:45 -04:00
chenyu
170b857da9 clean up deviceless const _buffer (#16274)
process on CPU similar to multi
2026-05-19 22:47:45 -04:00
chenyu
188d7ec15e clone can take device (#16271)
useful to materialize const on a specific device
2026-05-19 21:29:27 -04:00
chenyu
890b731b1e more prerequisuite test changed for deviceless const (#16264) 2026-05-19 15:43:45 -04:00
ttomsa
aa1e59ab97 X86 with Ops.INS (#14873)
* draft

* cleanup test_encodings

* cleanup test_isel

* model flag state and support rematerialization

* woops

* add vbroadcastss instruction

* don't fuse load if used multiple times in src

* add movabs instruction and fix idiv

* fixes

* add x86 backend to tests

* float16 fix

* rm TwoAddress2nd

* add BARRIER

* test windows ci

* yup isel fixes the mask stuff too and its beautiful

* add cmoves to the spec

* support storing imms

* no TUPLE_ORDER, breaks tests

* fix remaining seg faults

* add float max

* always fuse index

* minor

* fix DEFINE_VAR/SPECIAL and enable multithreading

* linter

* more linter

* more

* more

* more

* let's try this

* perhaps

* start new scheduler

* more scheduling info

* cleaner shuffle functions

* fixup isel tests

* skip bounds check when NOOPs exist

* skip inf rewrite tests

* fix const tag hack and add x86ops to _shape

* fix

* skip a few tests

* func arg order independent from op value

* x86 goes in own linearize

* switch to PARAM

* more

* add min x86op and neg in decomps

* do mulacc in isel

* use def_reg in test_encodings

* enable emulated int64 tests

* how much does this fix

* Ops becomes OpType

* fix

* rm noqa

* rm machine scheduler stuff

* and this

* allow for extending enums and move X86Ops out of uop

* fix imports

* rm X86GroupOp from ops.py

* spacing

* tell mypy to shut up

* more linter

* add x86op test

* allow set[X86Ops] in upat

* move NOOPs to pre_isel_matcher and rm NOOP from spec

* more asserts

* also this

* cleanup encode

* simplify live range

* fix idiv

* add Ops.INS to x86

* more changes

* more changes

* more changes

* fix

* fix

* fix

* fix

* print formatted assembly

* fix 8bit idiv?

* oops

* enable float16  and unaligned vector load/store

* actually no

* move x86 tests

* no more bool cast

* fix

* linter

* linter

* move X86Ops to x86.py

* fix vpbroadcast

* cleanups

* linter

* print correct reg names

* canonical max

* move max/min and add test

* support float16 vector load/store

* rm bad rewrite

* vpsrldq can't access memory

* regalloc takes renderer

* enable vector load/store on all dtypes

* more isel tests

* rm this for now

* a lot better

* fix

* fix

* fix

* deal with flags correctly

* fix

* enable gep noop rule

* fix

* fix

* fix

* add callee saved registers

* use Ops.CONST instead of X86Ops.IMM

* fix

* enable TUPLE_ORDER

* fix

* rm x86 code in linearizer

* fix

* fix

* fix

* move isa rewrites to codegen

* fix

* fix

* skip test_linearizer.py

* skip more tests

* fix

* fix for idiv/mod changes

* fix

* don't use fmadd if it duplicates fused op

* hacky

* fix

* cleanups

* cleanups

* fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-19 12:42:54 -07:00
George Hotz
a120709671 tighten shape spec for broadcasting (#16206)
* tighten shape spec for broadcasting

* use IndexError, not ValueError

* needs size
2026-05-18 22:12:04 -07:00
George Hotz
3f2d401464 all tests pass with NOOPT=1 (#16257)
* all tests pass with NOOPT=1

* fix a few more

* noopt 100% pass

* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
e694d7f222 more deviceless const prerequisites [pr] (#16256)
* more deviceless const prerequisites [pr]

* remove that

* arange.contiguous -> arange.clone in tests

arange will become deviceless const soon, update tests where it needs to be a buffer
2026-05-18 23:14:12 -04:00
chenyu
c1076ed56c Tensor.device and UOp.device can be None (#16255) 2026-05-18 22:08:10 -04:00
chenyu
d532b4f533 multi alu with deviceless const (#16251) 2026-05-18 19:31:53 -04:00
Christopher Milan
7515824a6d ci: actually use clang-20, enable bfloat16 (#16249) 2026-05-18 19:06:43 -04:00
chenyu
73e6b4963b to and shard is noop for deviceless uop (#16247) 2026-05-18 16:11:10 -04:00
chenyu
db639ebe3e deviceless const from UOp (#16243) 2026-05-18 14:14:12 -04:00
chenyu
5ae4dbd599 make slow tests faster (#16244) 2026-05-18 11:42:02 -04:00
chenyu
8631b6f17d remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050 do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
qazal
ebcb7b7cc0 fp8 gemm tests with scale args (#16231)
* update atol

* update fp8 path

* more work

* update profile.sh
2026-05-16 20:47:58 +09:00
wozeparrot
2d48d7ab09 remove more invalid (#16227) 2026-05-16 02:52:27 -07:00