Commit Graph

1857 Commits

Author SHA1 Message Date
nimlgen
70c2480e71 hcq2 to extra (#16126)
* hcq2 in extra

* correct

* some revert from non-extra

* cln

* cpu

* x

* attach

* min

* remove attach

* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c get_buf() for Buffer (#16134)
* p

* mypy

* x
2026-05-11 16:36:14 +03:00
Christopher Milan
faabe6aa42 nv: remaining firmware from /lib/firmware (#16088) 2026-05-07 23:07:43 -04:00
Christopher Milan
9a6f7f7576 nv: look for fmc firmware in /lib/firmware (#16080) 2026-05-07 18:08:27 -04:00
nimlgen
2f0aa884d5 tinygpu: minimal is macos13 for resets (#16075) 2026-05-07 21:25:56 +03:00
wozeparrot
730fa66bf3 llama speed 6 (#16071) 2026-05-06 20:51:03 -07:00
wozeparrot
ab6218bc92 llama mp fixes (#16050) 2026-05-05 15:35:32 -07:00
wozeparrot
528d35e306 llama speed 4 (#15993) 2026-04-30 17:14:41 -07:00
wozeparrot
eddcd4723b am_smi throttle info (#15997) 2026-04-30 15:28:32 -07:00
nimlgen
dfd2d07005 remove CompiledRunner (#15970)
* rm usage of CompiledRunner

* more tests

* last

* linter

* sink

* remove

* linter
2026-04-29 22:45:48 +03:00
qazal
a37b605523 remove arch from asm kernel class (#15977)
* rm arch from kernel

* update other tests

* update abstractions4.py
2026-04-30 03:39:52 +09:00
qazal
b63e0a5f74 viz/sqtt: move amd decoder to extra, don't import from ops_amd (#15969)
* don't import from ops_amd

* start

* cleanup
2026-04-30 00:49:15 +09:00
wozeparrot
ef09071073 llama: speed 2 (#15960) 2026-04-28 20:44:37 -07:00
Christopher Milan
e6863a1cc5 autogen: fewer type: ignores (#15956) 2026-04-28 21:58:13 -04:00
nimlgen
77965a22e5 local optimize as rewrite (#15953)
* local optimize as rewrite

* better

* x

* slighly rename

* fix

* ugh

* remove

* x

* remove

* not weak
2026-04-28 22:51:04 +03:00
qazal
b3f0f8d349 llama: fix missing label_smoothing arg (#15955) 2026-04-29 02:12:14 +09:00
wozeparrot
5e861cd2c4 llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00
nimlgen
4164666c72 programinfo (#15942)
* programinfo

* fix

* m

* x

* x

* changes

* x

* fix

* rm
2026-04-27 23:12:03 +03:00
qazal
8c174bdad4 viz/sqtt: correct exec pipes (#15885)
* wmma

* p2

* test

* left

* work

* pickle

* handwritten failing tests

* start work

* test the pipes

* empirical evidence

* update rdna4 enum types

* VALU pipe 1

* TRANSCENDENTAL pipe

* transcendental function units

* reorder

* wmma pipe

* cleanup and notes

* smaller

* work

* diff cleanup

* pickle

* use se:1

* int
2026-04-28 05:05:49 +09:00
nimlgen
bb652352c7 remove execitem (#15932)
* remove execitem

* f

* x
2026-04-25 19:33:04 +03:00
nimlgen
768106a542 remove schedule from extra/docs/examples (#15929)
* remove schedule from extra/docs/examples

* f
2026-04-25 14:09:12 +03:00
Denys Melnyk
1fdcb13bfb webgpu: fix weight lookup in export_model after compile_net key change (#15919)
* fix lookup site in export_model_webgpu after refactoring

webgpu (sd): fix export_model weight lookup after compile_net changes

fix lookup site in export_model_webgpu after refactoring

* add regression test
2026-04-25 10:04:55 +03:00
wozeparrot
4b908b6e2c llama: fused ce loss (#15920) 2026-04-24 20:01:24 -07:00
nimlgen
f2751955cb remove linear_to_schedule from tests (#15912)
* remove linear_to_schedule from tests

* x
2026-04-24 20:02:10 +03:00
qazal
f379b5a40a sqtt: match amd's TS_DELTA_SHORT offset (#15901) 2026-04-24 06:41:22 +03:00
wozeparrot
d3cbd781d9 llama: use fused norm mul quantize for w13 (#15878) 2026-04-22 21:27:41 -07:00
nimlgen
e5891acab2 jit: precompile (#15848)
* x

* jit: precompile as sep step

* x

* s

* x

* x

* x

* ?

* ?

* x

* x

* viz

* f

* x

* u

* x

* x
2026-04-23 00:23:32 +03:00
wozeparrot
87378331e8 llama: fused mul quantize fp8 (#15863) 2026-04-21 20:58:37 -07:00
chenyu
9192c93b7e Tensor.invalid -> Tesnor.invalids (#15849)
matches ones and zeros, and to not share name with UOp.invalid
2026-04-21 11:19:51 -04:00
nimlgen
bfe28ee2ad rm run_schedule (#15847) 2026-04-21 18:14:30 +03:00
nimlgen
ae9b84d32f rm beam uop (#15844) 2026-04-21 13:10:26 +03:00
qazal
f9655af2a3 viz/cli: move to tinygrad (#15835)
* move cli

* update imports

* cleanup the readme

* edit

* work

* details

* python -m tinygrad.viz.cli

* do not execv in non tty

* option

* lint

* simpler

* gemm pmc
2026-04-21 13:35:10 +09:00
qazal
601b9d3f59 viz/cli: dedup DEBUG=3 pyrender (#15826) 2026-04-20 19:29:09 +09:00
qazal
b05b1010bf viz/cli: ux cleanups, show user python (#15817)
* small fixes

* print python trace

* jsonl

* cleanup fmt, fix tqdm

* print mode

* types

* less

* keep those

* fix

* everyone can print json

* pmc p2
2026-04-20 03:50:48 +03:00
qazal
c6d8753ee1 viz/cli: --json support, refine docs (#15528)
* refine

* remove

* refine

* keep

* need to say this

* back

* feedback

* feedback

* json

* dur_ms

* et_ms

* remove useless thing

* docs

* respect NO_COLOR

* DEBUG also produces valid json
2026-04-19 21:53:38 +03:00
wozeparrot
f28ea84de2 llama: fused silu fp8 amax (#15798)
* llama: combined w13

* llama: fused swiglu+fp8

* llama: fix amax interleaving

* llama: don't need seperate matmul
2026-04-19 12:03:55 +08:00
nimlgen
022d8c4a11 remove jit_cache usage in extra/examples (#15808)
* remove jit_cache usage in extra/examples

* cached
2026-04-18 23:00:18 +03:00
qazal
2581985532 viz/cli: multi device profiler output, print markers (#15795)
* yield

* all devices

* better

* add unittests

* markers like this

* profile_markers work

* less

* update README

* tiny and null
2026-04-17 23:40:10 +03:00
qazal
a227dbece1 viz/cli: reconstruct DEBUG output (#15791)
* work

* work

* ext

* padding

* at time

* work

* reorder

* less flags

* num_rows

* feedback

* pmc
2026-04-17 18:27:58 +03:00
qazal
afc3904e58 viz/cli: unit tests in CI (#15788)
* simple failing test

* test stdout

* cleanup sqttmap
2026-04-17 22:34:44 +09:00
qazal
7bdb3adbbf viz/cli: simplification and reordering (#15785)
* remove

* work

* this is all one thing

* the reorder
2026-04-17 15:16:07 +03:00
wozeparrot
9e60e4a7e7 llama: native fp8 (#15733) 2026-04-16 22:16:05 -07:00
qazal
0e69388f6b viz/cli: add DEBUG, optional number of rows (#15777)
* tabulate switch

* support DEBUG

* --top

* improve

* work

* feedback

* 0

* print_kernel both ways

* simplify
2026-04-17 04:36:47 +03:00
qazal
6d9320ffb3 add NO_COLOR (#15765)
* NO_COLOR in cli

* add in helpers

* rm flags

* docs

* fix that

* temp

* Revert "temp"

This reverts commit 7522e664f6.
2026-04-16 22:44:55 +03:00
qazal
12c653a743 remove opts arg in get_program, everything uses opts_to_apply [pr] (#15767)
* check Ops.BEAM in process replay

* remove opts from the get_program api

* lint

* simplify

* cleanup
2026-04-16 22:42:43 +03:00
qazal
126cda45f8 viz/cli: cleanups, add memory printer (#15762)
* simple repro

* use context

* work

* memory printer

* rm

* memory printer

* pylint
2026-04-16 22:44:47 +09:00
George Hotz
d1cce7a476 put the ranges on store instead of after (#15759)
* put the ranges on store instead of after

* better assert

* fix stuff

* comment out slow rules i don't understand

* simpler rule

* closer

* return false for store

* fix loop

* only a few schedule failures remain

* remove stores to self

* all tests pass locally

* remove junk

* regression test and fix

* better test, bump broken torch count

* bugfix with regression test

* new fusion is better
2026-04-16 19:06:40 +08:00
qazal
1f26584b2e viz/cli: cleanups from linter (#15745)
* run linter

* pmc
2026-04-16 03:36:24 +09:00
chenyu
3394d18066 size*itemsize -> nbytes (#15729)
and some UOp.size removal to prep for size to mixin change
2026-04-14 16:27:54 -04:00
qazal
905b8adc97 viz: cli and server cleanups (#15713)
* update get_profile arg[0]

* uop_to_json arg[0]

* data is standalone in cli
2026-04-14 06:42:29 +09:00