Commit Graph

13507 Commits

Author SHA1 Message Date
George Hotz
e9569b8799 fix dsp 2026-06-03 12:09:46 -07:00
George Hotz
460c50710c correct webgpu is_packed 2026-06-03 11:58:54 -07:00
George Hotz
a01808c804 fix webgpu 2026-06-03 11:12:57 -07:00
George Hotz
72b7da7501 Merge branch 'master' into cstyle_new_style 2026-06-03 10:57:54 -07:00
George Hotz
cee472a0ef renderer Estimates uses maxel (#16485) 2026-06-03 10:55:00 -07:00
chenyu
8a4203638a make full with buffer=False deviceless (#16483)
affects arange and eye
2026-06-03 12:35:59 -04:00
qazal
405866f2b7 viz: improve kernel_graph.py usability (#16486)
* better default

* always format kernel output

* also show ref

* sched num
2026-06-03 21:12:44 +09:00
George Hotz
365e95bddf fixes 2026-06-02 20:21:21 -07:00
George Hotz
5b6ec75341 fix hip 2026-06-02 20:04:41 -07:00
George Hotz
ce84391f33 switch cstyle renderer to new style 2026-06-02 19:50:54 -07:00
Christopher Milan
f43cba5765 ci: native python where possible (#16473)
linters stays at 3.11
2026-06-02 22:40:12 -04:00
George Hotz
4e65ddfad5 cstyle new style 2026-06-02 18:56:51 -07:00
wozeparrot
7dcfd144b6 llama: columnwise fp8 scaling (#16480) 2026-06-02 18:55:45 -07:00
George Hotz
ffadd7a315 remove intel and amx support (#16482) 2026-06-02 18:53:05 -07:00
George Hotz
5f439e3b7c refactor cstyle to avoid dtype [PR] (#16478)
* refactor cstyle to avoid dtype

* clean up rules

* add new style option
2026-06-02 18:27:12 -07:00
Christopher Milan
80eeb4dd21 mockgpu: use autogen.libc (#16479) 2026-06-02 19:59:36 -04:00
chenyu
a43b55d480 deviceless const folding schedule test (#16477) 2026-06-02 18:46:30 -04:00
George Hotz
14f843737b renderer cleanups (pt 3) [PR] (#16475)
* renderer cleanups (pt 3)

* point refactors

* fix bugs

* fix PR
2026-06-02 14:24:24 -07:00
nimlgen
99e37b1ee3 hcq2: deps (#16459)
* start

* sin

* f
2026-06-02 22:34:25 +03:00
George Hotz
82f1c983d4 clean renderer migrations [pr] (#16472)
* clean renderer migrations

* minor webgpu

* use PARAM UOp as API

* make linter happy
2026-06-02 11:19:00 -07:00
Christopher Milan
9897658895 ci: fix ocelot compilation on macos (#16471) 2026-06-02 12:43:31 -04:00
chenyu
6b7d2b91df update test_uop_graph (#16470)
use UOp methods instead of constructing UOp directly, some of it violated spec
2026-06-02 08:53:54 -04:00
qazal
854eac09c6 llama: no E_ copy after bf16 GEMM (#16458) 2026-06-02 14:14:13 +09:00
George Hotz
7d8ed8d4d7 add store to buffer's addrspace (#16468) 2026-06-01 22:07:43 -07:00
George Hotz
20242fdf1d update test + spec from shrink_in_render (#16467)
* update test + spec from shrink_in_render

* cast
2026-06-01 19:24:43 -07:00
Christopher Milan
c6cad1ad67 ci: standardize runs-on (#16466)
* ci: use macos 26

* ugh github

* stick with github for arm
2026-06-01 21:39:58 -04:00
Christopher Milan
b0ecbb34d9 ci: cleanup python backend tests (#16465) 2026-06-01 20:08:05 -04:00
Christopher Milan
2d0f132a3b ci: cleanup more duplicate tests (#16462) 2026-06-01 18:56:29 -04:00
wozeparrot
aab9a5a8a3 llama: allow specifying layer count (#16464) 2026-06-01 15:36:04 -07:00
chenyu
0167401fa2 minor hcopt WHERE cleanup [PR] (#16463) 2026-06-01 17:58:38 -04:00
George Hotz
124d2f8227 anon addrspace from new renderer (#16461)
* anon addrspace from new renderer

* use max_numel in python renderer

* add sizes to ptrs in tests

* more

* correct fix
2026-06-01 14:42:02 -07:00
chenyu
517eea5985 no CONST(DEVICE) in create_allreduce_function (#16460) 2026-06-01 17:12:34 -04:00
chenyu
7e7b481ba7 less CONST(DEVICE) (#16452)
* less CONST(DEVICE)

no DEVICE for single device in const_like, multi has other issues

* maybe

* that?
2026-06-01 15:55:12 -04:00
George Hotz
556defa0f7 minor updates from vec removal (#16456) 2026-05-31 09:48:51 -07:00
Javier De Jesus
989f713c1b support negative pads in circular pad mode (#16448) 2026-05-31 09:28:45 -07:00
nimlgen
2c2cb339e0 fix word wrap (#16450) 2026-05-30 23:21:24 +03:00
qazal
29b47a0057 llama: update local amax implementation after ParamArgs change (#16446)
* local amax failing test

* update _local_abs_max_fxn
2026-05-30 16:55:43 +09:00
wozeparrot
6795c2d5c9 llama: zero grad this way (#16445) 2026-05-29 20:25:21 -07:00
George Hotz
cf55aaf01f python prg is pkl uops (#16443)
* python prg is pkl uops

* refactor to use uop

* refactor to u.
2026-05-29 19:13:51 -07:00
Christopher Milan
c377d01491 ci: run dsp on tinygrad[testing] (#16442) 2026-05-29 21:16:56 -04:00
wozeparrot
c23652e486 llama: minimize peak init mem (#16440) 2026-05-29 18:00:37 -07:00
Christopher Milan
d943493b79 ci: remove duplicate op compile test (#16441) 2026-05-29 19:20:31 -04:00
chenyu
8ac62b28e5 fix AffineGrid fusion (#16439) 2026-05-29 17:59:47 -04:00
Christopher Milan
ef50a49693 ci: macos dev matrix (#16436) 2026-05-29 17:40:32 -04:00
Christopher Milan
434cfa96a3 ci: no fetch in backend tests (#16438)
should make for less actions cache thrashing
2026-05-29 17:11:16 -04:00
chenyu
b7280705a7 limit CONST(UNIQUE) to invalids only (#16432) 2026-05-29 16:02:06 -04:00
George Hotz
9506b78d73 fix viz addrspace (#16437)
* fix viz addrspace

* revert that
2026-05-29 12:58:05 -07:00
nimlgen
d69aca41a9 hcq2: rework pm_bufferize (#16431) 2026-05-29 22:09:52 +03:00
George Hotz
e2a0434403 full derivation of addrspace (#16433)
* full derivation of addrspace

* w/e, it fixes it
2026-05-29 11:39:31 -07:00
wozeparrot
6787de9f52 llama: fix mp (#16434) 2026-05-29 11:21:43 -07:00