Commit Graph

1800 Commits

Author SHA1 Message Date
wozeparrot
528d35e306 llama speed 4 (#15993) 2026-04-30 17:14:41 -07:00
wozeparrot
eddcd4723b am_smi throttle info (#15997) 2026-04-30 15:28:32 -07:00
nimlgen
dfd2d07005 remove CompiledRunner (#15970)
* rm usage of CompiledRunner

* more tests

* last

* linter

* sink

* remove

* linter
2026-04-29 22:45:48 +03:00
qazal
a37b605523 remove arch from asm kernel class (#15977)
* rm arch from kernel

* update other tests

* update abstractions4.py
2026-04-30 03:39:52 +09:00
qazal
b63e0a5f74 viz/sqtt: move amd decoder to extra, don't import from ops_amd (#15969)
* don't import from ops_amd

* start

* cleanup
2026-04-30 00:49:15 +09:00
wozeparrot
ef09071073 llama: speed 2 (#15960) 2026-04-28 20:44:37 -07:00
Christopher Milan
e6863a1cc5 autogen: fewer type: ignores (#15956) 2026-04-28 21:58:13 -04:00
nimlgen
77965a22e5 local optimize as rewrite (#15953)
* local optimize as rewrite

* better

* x

* slighly rename

* fix

* ugh

* remove

* x

* remove

* not weak
2026-04-28 22:51:04 +03:00
qazal
b3f0f8d349 llama: fix missing label_smoothing arg (#15955) 2026-04-29 02:12:14 +09:00
wozeparrot
5e861cd2c4 llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00
nimlgen
4164666c72 programinfo (#15942)
* programinfo

* fix

* m

* x

* x

* changes

* x

* fix

* rm
2026-04-27 23:12:03 +03:00
qazal
8c174bdad4 viz/sqtt: correct exec pipes (#15885)
* wmma

* p2

* test

* left

* work

* pickle

* handwritten failing tests

* start work

* test the pipes

* empirical evidence

* update rdna4 enum types

* VALU pipe 1

* TRANSCENDENTAL pipe

* transcendental function units

* reorder

* wmma pipe

* cleanup and notes

* smaller

* work

* diff cleanup

* pickle

* use se:1

* int
2026-04-28 05:05:49 +09:00
nimlgen
bb652352c7 remove execitem (#15932)
* remove execitem

* f

* x
2026-04-25 19:33:04 +03:00
nimlgen
768106a542 remove schedule from extra/docs/examples (#15929)
* remove schedule from extra/docs/examples

* f
2026-04-25 14:09:12 +03:00
Denys Melnyk
1fdcb13bfb webgpu: fix weight lookup in export_model after compile_net key change (#15919)
* fix lookup site in export_model_webgpu after refactoring

webgpu (sd): fix export_model weight lookup after compile_net changes

fix lookup site in export_model_webgpu after refactoring

* add regression test
2026-04-25 10:04:55 +03:00
wozeparrot
4b908b6e2c llama: fused ce loss (#15920) 2026-04-24 20:01:24 -07:00
nimlgen
f2751955cb remove linear_to_schedule from tests (#15912)
* remove linear_to_schedule from tests

* x
2026-04-24 20:02:10 +03:00
qazal
f379b5a40a sqtt: match amd's TS_DELTA_SHORT offset (#15901) 2026-04-24 06:41:22 +03:00
wozeparrot
d3cbd781d9 llama: use fused norm mul quantize for w13 (#15878) 2026-04-22 21:27:41 -07:00
nimlgen
e5891acab2 jit: precompile (#15848)
* x

* jit: precompile as sep step

* x

* s

* x

* x

* x

* ?

* ?

* x

* x

* viz

* f

* x

* u

* x

* x
2026-04-23 00:23:32 +03:00
wozeparrot
87378331e8 llama: fused mul quantize fp8 (#15863) 2026-04-21 20:58:37 -07:00
chenyu
9192c93b7e Tensor.invalid -> Tesnor.invalids (#15849)
matches ones and zeros, and to not share name with UOp.invalid
2026-04-21 11:19:51 -04:00
nimlgen
bfe28ee2ad rm run_schedule (#15847) 2026-04-21 18:14:30 +03:00
nimlgen
ae9b84d32f rm beam uop (#15844) 2026-04-21 13:10:26 +03:00
qazal
f9655af2a3 viz/cli: move to tinygrad (#15835)
* move cli

* update imports

* cleanup the readme

* edit

* work

* details

* python -m tinygrad.viz.cli

* do not execv in non tty

* option

* lint

* simpler

* gemm pmc
2026-04-21 13:35:10 +09:00
qazal
601b9d3f59 viz/cli: dedup DEBUG=3 pyrender (#15826) 2026-04-20 19:29:09 +09:00
qazal
b05b1010bf viz/cli: ux cleanups, show user python (#15817)
* small fixes

* print python trace

* jsonl

* cleanup fmt, fix tqdm

* print mode

* types

* less

* keep those

* fix

* everyone can print json

* pmc p2
2026-04-20 03:50:48 +03:00
qazal
c6d8753ee1 viz/cli: --json support, refine docs (#15528)
* refine

* remove

* refine

* keep

* need to say this

* back

* feedback

* feedback

* json

* dur_ms

* et_ms

* remove useless thing

* docs

* respect NO_COLOR

* DEBUG also produces valid json
2026-04-19 21:53:38 +03:00
wozeparrot
f28ea84de2 llama: fused silu fp8 amax (#15798)
* llama: combined w13

* llama: fused swiglu+fp8

* llama: fix amax interleaving

* llama: don't need seperate matmul
2026-04-19 12:03:55 +08:00
nimlgen
022d8c4a11 remove jit_cache usage in extra/examples (#15808)
* remove jit_cache usage in extra/examples

* cached
2026-04-18 23:00:18 +03:00
qazal
2581985532 viz/cli: multi device profiler output, print markers (#15795)
* yield

* all devices

* better

* add unittests

* markers like this

* profile_markers work

* less

* update README

* tiny and null
2026-04-17 23:40:10 +03:00
qazal
a227dbece1 viz/cli: reconstruct DEBUG output (#15791)
* work

* work

* ext

* padding

* at time

* work

* reorder

* less flags

* num_rows

* feedback

* pmc
2026-04-17 18:27:58 +03:00
qazal
afc3904e58 viz/cli: unit tests in CI (#15788)
* simple failing test

* test stdout

* cleanup sqttmap
2026-04-17 22:34:44 +09:00
qazal
7bdb3adbbf viz/cli: simplification and reordering (#15785)
* remove

* work

* this is all one thing

* the reorder
2026-04-17 15:16:07 +03:00
wozeparrot
9e60e4a7e7 llama: native fp8 (#15733) 2026-04-16 22:16:05 -07:00
qazal
0e69388f6b viz/cli: add DEBUG, optional number of rows (#15777)
* tabulate switch

* support DEBUG

* --top

* improve

* work

* feedback

* 0

* print_kernel both ways

* simplify
2026-04-17 04:36:47 +03:00
qazal
6d9320ffb3 add NO_COLOR (#15765)
* NO_COLOR in cli

* add in helpers

* rm flags

* docs

* fix that

* temp

* Revert "temp"

This reverts commit 7522e664f6.
2026-04-16 22:44:55 +03:00
qazal
12c653a743 remove opts arg in get_program, everything uses opts_to_apply [pr] (#15767)
* check Ops.BEAM in process replay

* remove opts from the get_program api

* lint

* simplify

* cleanup
2026-04-16 22:42:43 +03:00
qazal
126cda45f8 viz/cli: cleanups, add memory printer (#15762)
* simple repro

* use context

* work

* memory printer

* rm

* memory printer

* pylint
2026-04-16 22:44:47 +09:00
George Hotz
d1cce7a476 put the ranges on store instead of after (#15759)
* put the ranges on store instead of after

* better assert

* fix stuff

* comment out slow rules i don't understand

* simpler rule

* closer

* return false for store

* fix loop

* only a few schedule failures remain

* remove stores to self

* all tests pass locally

* remove junk

* regression test and fix

* better test, bump broken torch count

* bugfix with regression test

* new fusion is better
2026-04-16 19:06:40 +08:00
qazal
1f26584b2e viz/cli: cleanups from linter (#15745)
* run linter

* pmc
2026-04-16 03:36:24 +09:00
chenyu
3394d18066 size*itemsize -> nbytes (#15729)
and some UOp.size removal to prep for size to mixin change
2026-04-14 16:27:54 -04:00
qazal
905b8adc97 viz: cli and server cleanups (#15713)
* update get_profile arg[0]

* uop_to_json arg[0]

* data is standalone in cli
2026-04-14 06:42:29 +09:00
George Hotz
16f50a40a5 remove REMU from tree (#15706)
* no more compare emulators

* remove remu from tree
2026-04-13 20:43:08 +08:00
qazal
ac027055ef viz: no global state (#15705)
* start viz data

* get_full_rewrites also moves

* update ref_map

* work

* update consumers

* cleaner cli

* linter

* cleanup tests

* back

* better

* sqtt tests
2026-04-13 21:35:20 +09:00
wozeparrot
457508d5a0 llama: save more 2 (#15681) 2026-04-11 01:03:36 -07:00
wozeparrot
55bcd7cc9e llama amax outside (#15670) 2026-04-09 23:08:03 -07:00
nimlgen
057dc173ab beam uop (#15660)
* beam as uop

* x
2026-04-09 19:13:03 +03:00
George Hotz
48a7627b04 add RDNA4 support to copy WMMA (#15663)
* add RDNA4 supportt to copy WMMA

* simpler

* simpler

* comment

* assert
2026-04-09 22:48:20 +08:00
qazal
742b3894d7 viz/cli: add pmc printer (#15651)
* viz/cli: add pmc printer

* cli work

* s

* linter

* pack workgroups

* add : to wgp

* counter name
2026-04-09 08:50:54 +09:00