wozeparrot
a1ec32cfd2
llama: current grad scaling ( #16518 )
2026-06-05 15:39:41 -07:00
nimlgen
5ebd44aa12
hcq2: merge queues ( #16514 )
...
* hcq2: mergw queues
* cleaner
2026-06-05 21:20:25 +03:00
qazal
79a13310b3
viz: kernel_graph.txt unique is per schedule ( #16511 )
2026-06-05 16:17:28 +09:00
nimlgen
3838c8df1b
hcq2: move global sync ( #16504 )
2026-06-04 17:32:40 +03:00
chenyu
0faaf6df26
remove kwargs from arange and linspace [PR] ( #16505 )
...
it used to have requires_grad and device, now both are removed
2026-06-04 10:32:37 -04:00
qazal
3b1a5f9770
llama: a_bT and aT_b bf16 gemms ( #16487 )
...
* hk_bf16_gemm
* enable in 8b
* cleanups
* rename to USE_HK_BF16_GEMM
* work
* work
* work
* work
* change the gemms
* work
* work
* set as default
* work
* change
2026-06-04 23:30:21 +09:00
nimlgen
11af81f96f
hcq2: cleaner ( #16502 )
2026-06-04 15:26:37 +03:00
chenyu
2c915c61ed
no CONST(DEVICE) in torch_backend ( #16499 )
2026-06-04 00:26:47 -04:00
qazal
f7f03bd7e5
viz: better name for src id in kernel_graph.txt ( #16495 )
...
* viz: better name for src id in kernel_graph.txt
* better order
* cleanup
2026-06-04 11:09:29 +09:00
nimlgen
6f2a2857c8
hcq2: refactor deps ( #16490 )
2026-06-03 23:20:24 +03:00
chenyu
8a4203638a
make full with buffer=False deviceless ( #16483 )
...
affects arange and eye
2026-06-03 12:35:59 -04:00
qazal
405866f2b7
viz: improve kernel_graph.py usability ( #16486 )
...
* better default
* always format kernel output
* also show ref
* sched num
2026-06-03 21:12:44 +09:00
wozeparrot
7dcfd144b6
llama: columnwise fp8 scaling ( #16480 )
2026-06-02 18:55:45 -07:00
George Hotz
ffadd7a315
remove intel and amx support ( #16482 )
2026-06-02 18:53:05 -07:00
nimlgen
99e37b1ee3
hcq2: deps ( #16459 )
...
* start
* sin
* f
2026-06-02 22:34:25 +03:00
qazal
854eac09c6
llama: no E_ copy after bf16 GEMM ( #16458 )
2026-06-02 14:14:13 +09:00
chenyu
7e7b481ba7
less CONST(DEVICE) ( #16452 )
...
* less CONST(DEVICE)
no DEVICE for single device in const_like, multi has other issues
* maybe
* that?
2026-06-01 15:55:12 -04:00
qazal
29b47a0057
llama: update local amax implementation after ParamArgs change ( #16446 )
...
* local amax failing test
* update _local_abs_max_fxn
2026-05-30 16:55:43 +09:00
Christopher Milan
434cfa96a3
ci: no fetch in backend tests ( #16438 )
...
should make for less actions cache thrashing
2026-05-29 17:11:16 -04:00
nimlgen
d69aca41a9
hcq2: rework pm_bufferize ( #16431 )
2026-05-29 22:09:52 +03:00
George Hotz
1e7f1dcf49
add ParamArgs [pr] ( #16421 )
...
* add ParamArgs
* fix export
* cleanups
* fixes
* simpler
2026-05-28 19:17:17 -07:00
nimlgen
b0e49afaf1
hcq2: new multi ( #16413 )
...
* hcq2: new multi
* op
2026-05-28 22:16:10 +03:00
George Hotz
edca5df25a
flip offset and shape in pad and shrink ( #16414 )
...
* flip offset and shape in pad and shrink
* dumb test
2026-05-28 11:58:19 -07:00
George Hotz
8ee3a37524
shrink/pad use (new_shape, offset) ( #16405 )
...
* shrink uses offset and shape
* pad does too
* fix
2026-05-27 15:13:08 -07:00
qazal
452c7d4230
llama: don't allocate grad_xw13 in bf16 ( #16359 )
2026-05-28 04:33:07 +09:00
nimlgen
0c385e31c6
hcq2 rewrite ( #16375 )
...
* hcq2 rewrite
* fi
* x
* simpler
2026-05-27 22:25:35 +03:00
chenyu
c33b767407
bring back test and torch backend change for unique const ( #16403 )
2026-05-27 15:16:08 -04:00
chenyu
945ed4f689
revert const unique changes ( #16395 )
2026-05-27 00:06:41 -04:00
George Hotz
156a4438d9
rename BUFFER_VIEW to SLICE ( #16391 )
...
* rename BUFFER_VIEW to SLICE
* fix comments
2026-05-26 18:15:00 -07:00
chenyu
d861c50dce
remove unique_const ( #16382 )
2026-05-26 13:53:31 -04:00
chenyu
9b00defc8c
Revert "remove unique_const ( #16372 )" ( #16380 )
...
This reverts commit 09019d6761 .
2026-05-26 12:30:07 -04:00
chenyu
09019d6761
remove unique_const ( #16372 )
...
* remove unique_const
* fix SDWA thing
* that?
2026-05-26 12:18:03 -04:00
George Hotz
7f1b02854e
bufferview offset is units of input dtype ( #16378 )
2026-05-26 08:49:31 -07:00
nimlgen
032905dec9
hcq2: simpler ( #16361 )
2026-05-26 14:28:48 +03:00
George Hotz
41ee7dab1c
script to generate testsig for DSP ( #16371 )
...
* script to generate testsig for DSP
* cleanups
2026-05-25 17:54:58 -07:00
George Hotz
942cb42b97
Revert "hotfix: bump Mac pytest timeout to 4 minutes"
...
This reverts commit 695a0069ed .
2026-05-25 17:25:11 -07:00
Christopher Milan
8ddd1328df
remove getenv(CI) ( #16365 )
...
gone everywhere except test_interop, because torch MPS does not work in actions
2026-05-25 20:23:33 -04:00
George Hotz
695a0069ed
hotfix: bump Mac pytest timeout to 4 minutes
2026-05-25 17:20:19 -07:00
George Hotz
689ab6a49f
move buffer view offset to src ( #16364 )
...
* this work?
* failed
2026-05-25 17:07:55 -07:00
qazal
eecd4706ff
fix mailbox comment, add types ( #16360 )
2026-05-25 22:24:00 +09:00
nimlgen
a891727c9f
hcq2: multi ( #16347 )
...
* hcq2: multi
* cleaner a bit
2026-05-24 19:28:33 +03:00
nimlgen
26b3b3f6a2
hcq2: move submit lowering to schedule ( #16330 )
...
* hcq: move submit lowering to schedule
* Dx
2026-05-22 23:15:19 +03:00
qazal
bbfe4f80ec
quantize_fp8 kernels in uops ( #16288 )
...
* add tests
* simple UOp kernel is n^2
* fast kernel matching c++, opts_to_apply=()
* remove cpp
* simple o(n) kernel, two passes
* fuse the loops
* works on DEV=CPU
* multi regression test
* fix multi, this can possibly be its own bugfix
* test cleanups
* minimal diff
* match C in UOps
* Revert "match C in UOps"
This reverts commit 0bef740c30 .
* edit test
* match speed with C try 2
* needs_second_gpu
* cleanup
2026-05-22 20:54:06 +09:00
Christopher Milan
c2d06570a5
remove getenv(CI) from core tinygrad ( #16326 )
2026-05-21 22:20:33 -04:00
chenyu
31424cda71
Tensor.requires_grad -> is_param ( #16325 )
...
for optimizer
2026-05-21 19:39:57 -04:00
wozeparrot
afc5bfa183
llama: remove fused grad accum ( #16301 )
2026-05-21 09:38:40 -07:00
nimlgen
a321700baa
hcq2: multi prereqs ( #16304 )
2026-05-21 17:00:52 +03:00
Christopher Milan
172f9493e1
move is_dtype_supported to renderer ( #16226 )
2026-05-20 21:19:37 -04:00
George Hotz
58d58c1659
remove DEVECTORIZE ( #16290 )
...
* remove DEVECTORIZE
* fully remove DEVECTORIZE
2026-05-20 13:25:49 -07:00
nimlgen
a88feef40f
hcq2: cleanups ( #16278 )
...
* s
* simpler
* simler
2026-05-20 21:48:50 +03:00