George Hotz
8ee3a37524
shrink/pad use (new_shape, offset) ( #16405 )
...
* shrink uses offset and shape
* pad does too
* fix
2026-05-27 15:13:08 -07:00
qazal
452c7d4230
llama: don't allocate grad_xw13 in bf16 ( #16359 )
2026-05-28 04:33:07 +09:00
nimlgen
0c385e31c6
hcq2 rewrite ( #16375 )
...
* hcq2 rewrite
* fi
* x
* simpler
2026-05-27 22:25:35 +03:00
chenyu
c33b767407
bring back test and torch backend change for unique const ( #16403 )
2026-05-27 15:16:08 -04:00
chenyu
945ed4f689
revert const unique changes ( #16395 )
2026-05-27 00:06:41 -04:00
George Hotz
156a4438d9
rename BUFFER_VIEW to SLICE ( #16391 )
...
* rename BUFFER_VIEW to SLICE
* fix comments
2026-05-26 18:15:00 -07:00
chenyu
d861c50dce
remove unique_const ( #16382 )
2026-05-26 13:53:31 -04:00
chenyu
9b00defc8c
Revert "remove unique_const ( #16372 )" ( #16380 )
...
This reverts commit 09019d6761 .
2026-05-26 12:30:07 -04:00
chenyu
09019d6761
remove unique_const ( #16372 )
...
* remove unique_const
* fix SDWA thing
* that?
2026-05-26 12:18:03 -04:00
George Hotz
7f1b02854e
bufferview offset is units of input dtype ( #16378 )
2026-05-26 08:49:31 -07:00
nimlgen
032905dec9
hcq2: simpler ( #16361 )
2026-05-26 14:28:48 +03:00
George Hotz
41ee7dab1c
script to generate testsig for DSP ( #16371 )
...
* script to generate testsig for DSP
* cleanups
2026-05-25 17:54:58 -07:00
George Hotz
942cb42b97
Revert "hotfix: bump Mac pytest timeout to 4 minutes"
...
This reverts commit 695a0069ed .
2026-05-25 17:25:11 -07:00
Christopher Milan
8ddd1328df
remove getenv(CI) ( #16365 )
...
gone everywhere except test_interop, because torch MPS does not work in actions
2026-05-25 20:23:33 -04:00
George Hotz
695a0069ed
hotfix: bump Mac pytest timeout to 4 minutes
2026-05-25 17:20:19 -07:00
George Hotz
689ab6a49f
move buffer view offset to src ( #16364 )
...
* this work?
* failed
2026-05-25 17:07:55 -07:00
qazal
eecd4706ff
fix mailbox comment, add types ( #16360 )
2026-05-25 22:24:00 +09:00
nimlgen
a891727c9f
hcq2: multi ( #16347 )
...
* hcq2: multi
* cleaner a bit
2026-05-24 19:28:33 +03:00
nimlgen
26b3b3f6a2
hcq2: move submit lowering to schedule ( #16330 )
...
* hcq: move submit lowering to schedule
* Dx
2026-05-22 23:15:19 +03:00
qazal
bbfe4f80ec
quantize_fp8 kernels in uops ( #16288 )
...
* add tests
* simple UOp kernel is n^2
* fast kernel matching c++, opts_to_apply=()
* remove cpp
* simple o(n) kernel, two passes
* fuse the loops
* works on DEV=CPU
* multi regression test
* fix multi, this can possibly be its own bugfix
* test cleanups
* minimal diff
* match C in UOps
* Revert "match C in UOps"
This reverts commit 0bef740c30 .
* edit test
* match speed with C try 2
* needs_second_gpu
* cleanup
2026-05-22 20:54:06 +09:00
Christopher Milan
c2d06570a5
remove getenv(CI) from core tinygrad ( #16326 )
2026-05-21 22:20:33 -04:00
chenyu
31424cda71
Tensor.requires_grad -> is_param ( #16325 )
...
for optimizer
2026-05-21 19:39:57 -04:00
wozeparrot
afc5bfa183
llama: remove fused grad accum ( #16301 )
2026-05-21 09:38:40 -07:00
nimlgen
a321700baa
hcq2: multi prereqs ( #16304 )
2026-05-21 17:00:52 +03:00
Christopher Milan
172f9493e1
move is_dtype_supported to renderer ( #16226 )
2026-05-20 21:19:37 -04:00
George Hotz
58d58c1659
remove DEVECTORIZE ( #16290 )
...
* remove DEVECTORIZE
* fully remove DEVECTORIZE
2026-05-20 13:25:49 -07:00
nimlgen
a88feef40f
hcq2: cleanups ( #16278 )
...
* s
* simpler
* simler
2026-05-20 21:48:50 +03:00
qazal
1e0fffe256
fused ce llama kernel in UOps ( #16263 )
...
* work
* using uops
* delete things
* work
* work
* higher level uops
* cleanups
2026-05-20 19:45:28 +09:00
nimlgen
b3dcf8f452
hcq2: split into schedule/realize ( #16216 )
...
* hcq2: split into schedule/realize
* missing
* x
* f
* clean
* cleaner
* x
* x
* x
* x
* x
2026-05-19 16:40:17 +03:00
qazal
e4350e7de9
set hipcc mac docker to 7.1 ( #16261 )
...
* set hipcc mac docker to 7.1
* pull from amd
2026-05-19 21:30:39 +09:00
qazal
bfb2d1f89a
Revert "fp8 gemm speedup ( #16236 )" ( #16245 )
...
This reverts commit d95bf394e1 .
2026-05-19 02:01:44 +09:00
chenyu
dcee90aa3f
remove requires_grad use in extra/examples ( #16238 )
...
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
qazal
d95bf394e1
fp8 gemm speedup ( #16236 )
...
* add asm_gemm option
* milestone
* work
* edit
* only the fast kernel
* diff
2026-05-17 04:58:28 +09:00
qazal
d54fa86b71
viz/cli: select all calls in graph by default ( #16214 )
2026-05-15 21:01:44 +09:00
nimlgen
28b98e529d
nv: move structs to vram ( #16184 )
...
* nv: vram
* x
* 4090
* x
* move and sysmem on macos
* x
* remove hp
2026-05-15 13:41:42 +03:00
chenyu
409bb0c9ad
requires_grad cannot be None ( #16212 )
...
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
nimlgen
62ea73719d
hcq2: share more with graph ( #16196 )
...
* share more with graph
* comment
2026-05-14 22:28:11 +03:00
nimlgen
ad1fb7c981
hcq2: graph ( #16186 )
...
* keep this for now
* early graph
2026-05-13 22:49:43 +03:00
wozeparrot
e97f2c1114
llama: only gemm + fa custom kernel ( #16180 )
...
* llama: tie store to grad directly
* llama: set mp flags
* llama: non fused grad fp8 quantize path
2026-05-12 21:03:49 -07:00
Christopher Milan
316607f004
dsp: don't use docker in ci ( #16167 )
...
* dsp: don't use docker in ci
* add setup script for macos docker
2026-05-12 17:11:03 -04:00
nimlgen
e93fb5f9b9
hcq2: remove hcqprogram ( #16157 )
...
* hcq2 rm program
* nonbeauty
* no prog
* tiny
* f
* x
2026-05-12 18:49:13 +03:00
nimlgen
e5729935c6
time_call ( #16152 )
...
* time_call
* x
* fix caches
2026-05-12 16:58:28 +03:00
chenyu
3942a80f66
fix wrong kwargs passed into rands ( #16149 )
...
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Vikram Rangarajan
effa263865
Torch backend aten::cat.out fix ( #16121 )
...
* Handle empty 1D tensors in cat_out
* Undid other changes
* Fixed torch cat
* Improved cat.out, added more tests
* Cleaned code
* Type hinted dim
* Removed whitespace
2026-05-11 16:28:16 -07:00
qazal
fc2cc1d77a
viz: call graph renderer example ( #16141 )
...
* work
* emits
* this
* cleaner repr for custom binaries
* --call-graph
* _ref
* this
* start
* this
* everything execpt the pyrender
* bring pyrender back
2026-05-12 05:07:30 +09:00
nimlgen
70c2480e71
hcq2 to extra ( #16126 )
...
* hcq2 in extra
* correct
* some revert from non-extra
* cln
* cpu
* x
* attach
* min
* remove attach
* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c
get_buf() for Buffer ( #16134 )
...
* p
* mypy
* x
2026-05-11 16:36:14 +03:00
Christopher Milan
faabe6aa42
nv: remaining firmware from /lib/firmware ( #16088 )
2026-05-07 23:07:43 -04:00
Christopher Milan
9a6f7f7576
nv: look for fmc firmware in /lib/firmware ( #16080 )
2026-05-07 18:08:27 -04:00
nimlgen
2f0aa884d5
tinygpu: minimal is macos13 for resets ( #16075 )
2026-05-07 21:25:56 +03:00