Commit Graph

13260 Commits

Author SHA1 Message Date
George Hotz
8612385ccb add all codegen stages to spec_tensor 2026-05-12 10:23:03 -07:00
chenyu
f3e3c3851f explicit args to Tensor.rand (#16161)
added requires_grad, other kwargs were silently dropped
2026-05-12 12:53:39 -04:00
nimlgen
e93fb5f9b9 hcq2: remove hcqprogram (#16157)
* hcq2 rm program

* nonbeauty

* no prog

* tiny

* f

* x
2026-05-12 18:49:13 +03:00
nimlgen
a708542308 fix ci spec (#16156) 2026-05-12 17:57:11 +03:00
nimlgen
e5729935c6 time_call (#16152)
* time_call

* x

* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a add Ops.SOURCE test (#16155)
* simple failing test

* raises

* change
2026-05-12 22:49:32 +09:00
qazal
5cd0494b14 viz: canonicalize ast for schedule to codegen linking (#16154)
* simple failing test

* always null device

* viz: canonicalize ast for schedule to codegen linking

* SCACHE
2026-05-12 22:40:21 +09:00
qazal
c1d125ff3b llm: add markers to --benchmark (#16153)
* markers in llm

* ui fix
2026-05-12 20:14:11 +09:00
wozeparrot
e9359d9e7d more llama mp fixes (#16151)
* llama: SPLIT_W13

* llama: fix with no fused kernels

* llama: cast to bf16 on non asm_gemm patH

* llama: new mp flags
2026-05-11 21:29:23 -07:00
chenyu
09fd80fba6 fix randperm and _multi_like drop requires_grad (#16150) 2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7 Update the spec in spec.py to match the current state (#16132)
* start work on specv2

* more spec

* more spec

* fix amd emulator

* more spec

* more

* fix test_uop_graph

* move those

* spec=2

* skip those questionable tests

* ptx fix

* more spec=2

* store

* allow custom function in tensor

* spec 2

* fix beam search for tensor cores

* delete the old specs

* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66 fix wrong kwargs passed into rands (#16149)
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Christopher Milan
039d84ff02 Revert "onnx: deduplicate simple proto parsers" (#16148)
This reverts commit 83eaefcd0f.
2026-05-11 21:45:17 -04:00
Christopher Milan
20f587d5d5 nv: rm _download (#16147) 2026-05-11 19:56:37 -04:00
chenyu
371ab2023f clean up image_dot and image_conv2d (#16145) 2026-05-11 19:37:58 -04:00
Vikram Rangarajan
effa263865 Torch backend aten::cat.out fix (#16121)
* Handle empty 1D tensors in cat_out

* Undid other changes

* Fixed torch cat

* Improved cat.out, added more tests

* Cleaned code

* Type hinted dim

* Removed whitespace
2026-05-11 16:28:16 -07:00
chenyu
63c1f00b80 disable test_svd_general again (#16146)
flaky on CI
2026-05-11 19:24:32 -04:00
Christopher Milan
2dccd4a3eb am: autogen pmc (#16143)
* am: autogen pmc

* cleanup

* fix

* type
2026-05-11 19:22:12 -04:00
Christopher Milan
7ba55ad3ba nv: autogen regs (#16139)
* nv: autogen regs

* flcn cot

* ci

* gen
2026-05-11 18:52:24 -04:00
chenyu
0b02fb6797 Revert "[pr] match torch rmsnorm (#16122)" (#16144)
This reverts commit 692257dd70.
2026-05-11 17:53:42 -04:00
chenyu
fbe8be0b8b style cleanup to Tensor.qr and svd (#16142)
* style cleanup to Tensor.qr and svd

same kernels

* more

* enable
2026-05-11 17:16:59 -04:00
qazal
fc2cc1d77a viz: call graph renderer example (#16141)
* work

* emits

* this

* cleaner repr for custom binaries

* --call-graph

* _ref

* this

* start

* this

* everything execpt the pyrender

* bring pyrender back
2026-05-12 05:07:30 +09:00
chenyu
f65e343fb3 spec.py cleanups (#16140)
removed END from shared_spec and NOOP from full_spec
2026-05-11 15:59:49 -04:00
Joshua James Venter
692257dd70 [pr] match torch rmsnorm (#16122)
* [pr] match rmsnorm torch

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>

* 1e-5

* ops.md

---------

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-05-11 14:36:41 -04:00
Sachith Shetty
59a81559d4 fix: add self.device to qr, svd, masked_select intermediates (#16131) 2026-05-11 11:22:54 -04:00
nimlgen
70c2480e71 hcq2 to extra (#16126)
* hcq2 in extra

* correct

* some revert from non-extra

* cln

* cpu

* x

* attach

* min

* remove attach

* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c get_buf() for Buffer (#16134)
* p

* mypy

* x
2026-05-11 16:36:14 +03:00
qazal
2dd84416bf viz/cli: schedule renderer (#16101)
* simpler steps

* work

* work

* iterate

* faster

* better

* simplify more

* sys stdin

* less

* work

* work and mv

* better

* seen bufs

* all call graphs

* print query

* ux

* param to buffer / buffer_view

* work

* respect NO_COLOR in uop_to_json

* less

* render uops

* rm custom renderer

* call can't pyrender.

* unrelated diff

* assert

* 5
2026-05-11 01:56:16 +09:00
George Hotz
53f9587099 add canary 2026-05-10 09:38:18 -07:00
George Hotz
28cb7f1bcc update readme with contributing guidelines 2026-05-10 09:35:48 -07:00
George Hotz
daed602569 rename BUFFERIZE to STAGE (#16125) 2026-05-10 09:26:46 -07:00
qazal
39ce780907 viz/cli: emit all runs of selected kernel, json fixes (#16124)
* keep print

* --json in tests, sqtt --json err

* work

* import

* less

* line
2026-05-10 21:45:51 +09:00
qazal
51c7dafb0d split viz cli test helpers (#16123) 2026-05-10 19:42:24 +09:00
chenyu
b2a682ec60 remove _shape check in pm_mops [pr] (#16120)
seems fine now
2026-05-09 17:54:22 -04:00
wozeparrot
026688f03f llama: move to correct dir (#16118) 2026-05-08 19:42:16 -07:00
Christopher Milan
a7512e0d12 PYTHON: images have no alignment constraints (by default) (#16115) 2026-05-08 20:35:03 -04:00
Christopher Milan
105b037c3c cl: image alignment in arch (#16106) 2026-05-08 19:33:33 -04:00
Charlie Kerfoot
71a8c0da09 fix: trailing space format string (#16005) 2026-05-08 16:31:10 -07:00
Pawan
4dd6ad3514 gradient: add TRUNC backward (#15925)
* gradient: add TRUNC backward

* test: move round quantization gradient to test_ops
2026-05-08 16:27:55 -07:00
chenyu
5152ff95e7 _pad_constant and avg_pool2d cleanups (#16110) 2026-05-08 18:09:47 -04:00
chenyu
e6584532f4 minor elementwise cleanups (#16102) 2026-05-08 13:38:34 -04:00
nimlgen
49b55af619 jit: simpler free_intermediates (#16099) 2026-05-08 19:08:33 +03:00
chenyu
0f46c08582 div mixin cleanups (#16100) 2026-05-08 12:05:37 -04:00
chenyu
235044c9d8 Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD (#16093)
* Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD

* ruff
2026-05-07 23:18:15 -04:00
Christopher Milan
faabe6aa42 nv: remaining firmware from /lib/firmware (#16088) 2026-05-07 23:07:43 -04:00
b1tg
7ef901a81d llm: moe speedup (#16059) 2026-05-07 19:06:35 -07:00
George Hotz
80da8a4b9c add spec to main tinygrad repo (#16092) 2026-05-07 18:52:49 -07:00
June
83eaefcd0f onnx: deduplicate simple proto parsers (#16085)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-07 18:44:27 -07:00
George Hotz
c106c73e51 remove the gate from index (#16081)
* remove the gate from index

* gpt says this works

* remove hanging casts

* simplify

* move that down

* move gates

* ptr

* remove that simplify

* move that
2026-05-07 18:42:00 -07:00
wozeparrot
d11f4d0ec2 fix: don't copy on slice of DP weight (#16089) 2026-05-07 17:58:01 -07:00