1875 Commits

Author SHA1 Message Date
wozeparrot
a1ec32cfd2 llama: current grad scaling (#16518) 2026-06-05 15:39:41 -07:00
nimlgen
5ebd44aa12 hcq2: merge queues (#16514)
* hcq2: mergw queues

* cleaner
2026-06-05 21:20:25 +03:00
qazal
79a13310b3 viz: kernel_graph.txt unique is per schedule (#16511) 2026-06-05 16:17:28 +09:00
nimlgen
3838c8df1b hcq2: move global sync (#16504) 2026-06-04 17:32:40 +03:00
chenyu
0faaf6df26 remove kwargs from arange and linspace [PR] (#16505)
it used to have requires_grad and device, now both are removed
2026-06-04 10:32:37 -04:00
qazal
3b1a5f9770 llama: a_bT and aT_b bf16 gemms (#16487)
* hk_bf16_gemm

* enable in 8b

* cleanups

* rename to USE_HK_BF16_GEMM

* work

* work

* work

* work

* change the gemms

* work

* work

* set as default

* work

* change
2026-06-04 23:30:21 +09:00
nimlgen
11af81f96f hcq2: cleaner (#16502) 2026-06-04 15:26:37 +03:00
chenyu
2c915c61ed no CONST(DEVICE) in torch_backend (#16499) 2026-06-04 00:26:47 -04:00
qazal
f7f03bd7e5 viz: better name for src id in kernel_graph.txt (#16495)
* viz: better name for src id in kernel_graph.txt

* better order

* cleanup
2026-06-04 11:09:29 +09:00
nimlgen
6f2a2857c8 hcq2: refactor deps (#16490) 2026-06-03 23:20:24 +03:00
chenyu
8a4203638a make full with buffer=False deviceless (#16483)
affects arange and eye
2026-06-03 12:35:59 -04:00
qazal
405866f2b7 viz: improve kernel_graph.py usability (#16486)
* better default

* always format kernel output

* also show ref

* sched num
2026-06-03 21:12:44 +09:00
wozeparrot
7dcfd144b6 llama: columnwise fp8 scaling (#16480) 2026-06-02 18:55:45 -07:00
George Hotz
ffadd7a315 remove intel and amx support (#16482) 2026-06-02 18:53:05 -07:00
nimlgen
99e37b1ee3 hcq2: deps (#16459)
* start

* sin

* f
2026-06-02 22:34:25 +03:00
qazal
854eac09c6 llama: no E_ copy after bf16 GEMM (#16458) 2026-06-02 14:14:13 +09:00
chenyu
7e7b481ba7 less CONST(DEVICE) (#16452)
* less CONST(DEVICE)

no DEVICE for single device in const_like, multi has other issues

* maybe

* that?
2026-06-01 15:55:12 -04:00
qazal
29b47a0057 llama: update local amax implementation after ParamArgs change (#16446)
* local amax failing test

* update _local_abs_max_fxn
2026-05-30 16:55:43 +09:00
Christopher Milan
434cfa96a3 ci: no fetch in backend tests (#16438)
should make for less actions cache thrashing
2026-05-29 17:11:16 -04:00
nimlgen
d69aca41a9 hcq2: rework pm_bufferize (#16431) 2026-05-29 22:09:52 +03:00
George Hotz
1e7f1dcf49 add ParamArgs [pr] (#16421)
* add ParamArgs

* fix export

* cleanups

* fixes

* simpler
2026-05-28 19:17:17 -07:00
nimlgen
b0e49afaf1 hcq2: new multi (#16413)
* hcq2: new multi

* op
2026-05-28 22:16:10 +03:00
George Hotz
edca5df25a flip offset and shape in pad and shrink (#16414)
* flip offset and shape in pad and shrink

* dumb test
2026-05-28 11:58:19 -07:00
George Hotz
8ee3a37524 shrink/pad use (new_shape, offset) (#16405)
* shrink uses offset and shape

* pad does too

* fix
2026-05-27 15:13:08 -07:00
qazal
452c7d4230 llama: don't allocate grad_xw13 in bf16 (#16359) 2026-05-28 04:33:07 +09:00
nimlgen
0c385e31c6 hcq2 rewrite (#16375)
* hcq2 rewrite

* fi

* x

* simpler
2026-05-27 22:25:35 +03:00
chenyu
c33b767407 bring back test and torch backend change for unique const (#16403) 2026-05-27 15:16:08 -04:00
chenyu
945ed4f689 revert const unique changes (#16395) 2026-05-27 00:06:41 -04:00
George Hotz
156a4438d9 rename BUFFER_VIEW to SLICE (#16391)
* rename BUFFER_VIEW to SLICE

* fix comments
2026-05-26 18:15:00 -07:00
chenyu
d861c50dce remove unique_const (#16382) 2026-05-26 13:53:31 -04:00
chenyu
9b00defc8c Revert "remove unique_const (#16372)" (#16380)
This reverts commit 09019d6761.
2026-05-26 12:30:07 -04:00
chenyu
09019d6761 remove unique_const (#16372)
* remove unique_const

* fix SDWA thing

* that?
2026-05-26 12:18:03 -04:00
George Hotz
7f1b02854e bufferview offset is units of input dtype (#16378) 2026-05-26 08:49:31 -07:00
nimlgen
032905dec9 hcq2: simpler (#16361) 2026-05-26 14:28:48 +03:00
George Hotz
41ee7dab1c script to generate testsig for DSP (#16371)
* script to generate testsig for DSP

* cleanups
2026-05-25 17:54:58 -07:00
George Hotz
942cb42b97 Revert "hotfix: bump Mac pytest timeout to 4 minutes"
This reverts commit 695a0069ed.
2026-05-25 17:25:11 -07:00
Christopher Milan
8ddd1328df remove getenv(CI) (#16365)
gone everywhere except test_interop, because torch MPS does not work in actions
2026-05-25 20:23:33 -04:00
George Hotz
695a0069ed hotfix: bump Mac pytest timeout to 4 minutes 2026-05-25 17:20:19 -07:00
George Hotz
689ab6a49f move buffer view offset to src (#16364)
* this work?

* failed
2026-05-25 17:07:55 -07:00
qazal
eecd4706ff fix mailbox comment, add types (#16360) 2026-05-25 22:24:00 +09:00
nimlgen
a891727c9f hcq2: multi (#16347)
* hcq2: multi

* cleaner a bit
2026-05-24 19:28:33 +03:00
nimlgen
26b3b3f6a2 hcq2: move submit lowering to schedule (#16330)
* hcq: move submit lowering to schedule

* Dx
2026-05-22 23:15:19 +03:00
qazal
bbfe4f80ec quantize_fp8 kernels in uops (#16288)
* add tests

* simple UOp kernel is n^2

* fast kernel matching c++, opts_to_apply=()

* remove cpp

* simple o(n) kernel, two passes

* fuse the loops

* works on DEV=CPU

* multi regression test

* fix multi, this can possibly be its own bugfix

* test cleanups

* minimal diff

* match C in UOps

* Revert "match C in UOps"

This reverts commit 0bef740c30.

* edit test

* match speed with C try 2

* needs_second_gpu

* cleanup
2026-05-22 20:54:06 +09:00
Christopher Milan
c2d06570a5 remove getenv(CI) from core tinygrad (#16326) 2026-05-21 22:20:33 -04:00
chenyu
31424cda71 Tensor.requires_grad -> is_param (#16325)
for optimizer
2026-05-21 19:39:57 -04:00
wozeparrot
afc5bfa183 llama: remove fused grad accum (#16301) 2026-05-21 09:38:40 -07:00
nimlgen
a321700baa hcq2: multi prereqs (#16304) 2026-05-21 17:00:52 +03:00
Christopher Milan
172f9493e1 move is_dtype_supported to renderer (#16226) 2026-05-20 21:19:37 -04:00
George Hotz
58d58c1659 remove DEVECTORIZE (#16290)
* remove DEVECTORIZE

* fully remove DEVECTORIZE
2026-05-20 13:25:49 -07:00
nimlgen
a88feef40f hcq2: cleanups (#16278)
* s

* simpler

* simler
2026-05-20 21:48:50 +03:00