wozeparrot
528d35e306
llama speed 4 ( #15993 )
2026-04-30 17:14:41 -07:00
wozeparrot
eddcd4723b
am_smi throttle info ( #15997 )
2026-04-30 15:28:32 -07:00
nimlgen
dfd2d07005
remove CompiledRunner ( #15970 )
...
* rm usage of CompiledRunner
* more tests
* last
* linter
* sink
* remove
* linter
2026-04-29 22:45:48 +03:00
qazal
a37b605523
remove arch from asm kernel class ( #15977 )
...
* rm arch from kernel
* update other tests
* update abstractions4.py
2026-04-30 03:39:52 +09:00
qazal
b63e0a5f74
viz/sqtt: move amd decoder to extra, don't import from ops_amd ( #15969 )
...
* don't import from ops_amd
* start
* cleanup
2026-04-30 00:49:15 +09:00
wozeparrot
ef09071073
llama: speed 2 ( #15960 )
2026-04-28 20:44:37 -07:00
Christopher Milan
e6863a1cc5
autogen: fewer type: ignores ( #15956 )
2026-04-28 21:58:13 -04:00
nimlgen
77965a22e5
local optimize as rewrite ( #15953 )
...
* local optimize as rewrite
* better
* x
* slighly rename
* fix
* ugh
* remove
* x
* remove
* not weak
2026-04-28 22:51:04 +03:00
qazal
b3f0f8d349
llama: fix missing label_smoothing arg ( #15955 )
2026-04-29 02:12:14 +09:00
wozeparrot
5e861cd2c4
llama: move llama kernels to llama_kernels ( #15952 )
2026-04-27 22:48:53 -07:00
nimlgen
4164666c72
programinfo ( #15942 )
...
* programinfo
* fix
* m
* x
* x
* changes
* x
* fix
* rm
2026-04-27 23:12:03 +03:00
qazal
8c174bdad4
viz/sqtt: correct exec pipes ( #15885 )
...
* wmma
* p2
* test
* left
* work
* pickle
* handwritten failing tests
* start work
* test the pipes
* empirical evidence
* update rdna4 enum types
* VALU pipe 1
* TRANSCENDENTAL pipe
* transcendental function units
* reorder
* wmma pipe
* cleanup and notes
* smaller
* work
* diff cleanup
* pickle
* use se:1
* int
2026-04-28 05:05:49 +09:00
nimlgen
bb652352c7
remove execitem ( #15932 )
...
* remove execitem
* f
* x
2026-04-25 19:33:04 +03:00
nimlgen
768106a542
remove schedule from extra/docs/examples ( #15929 )
...
* remove schedule from extra/docs/examples
* f
2026-04-25 14:09:12 +03:00
Denys Melnyk
1fdcb13bfb
webgpu: fix weight lookup in export_model after compile_net key change ( #15919 )
...
* fix lookup site in export_model_webgpu after refactoring
webgpu (sd): fix export_model weight lookup after compile_net changes
fix lookup site in export_model_webgpu after refactoring
* add regression test
2026-04-25 10:04:55 +03:00
wozeparrot
4b908b6e2c
llama: fused ce loss ( #15920 )
2026-04-24 20:01:24 -07:00
nimlgen
f2751955cb
remove linear_to_schedule from tests ( #15912 )
...
* remove linear_to_schedule from tests
* x
2026-04-24 20:02:10 +03:00
qazal
f379b5a40a
sqtt: match amd's TS_DELTA_SHORT offset ( #15901 )
2026-04-24 06:41:22 +03:00
wozeparrot
d3cbd781d9
llama: use fused norm mul quantize for w13 ( #15878 )
2026-04-22 21:27:41 -07:00
nimlgen
e5891acab2
jit: precompile ( #15848 )
...
* x
* jit: precompile as sep step
* x
* s
* x
* x
* x
* ?
* ?
* x
* x
* viz
* f
* x
* u
* x
* x
2026-04-23 00:23:32 +03:00
wozeparrot
87378331e8
llama: fused mul quantize fp8 ( #15863 )
2026-04-21 20:58:37 -07:00
chenyu
9192c93b7e
Tensor.invalid -> Tesnor.invalids ( #15849 )
...
matches ones and zeros, and to not share name with UOp.invalid
2026-04-21 11:19:51 -04:00
nimlgen
bfe28ee2ad
rm run_schedule ( #15847 )
2026-04-21 18:14:30 +03:00
nimlgen
ae9b84d32f
rm beam uop ( #15844 )
2026-04-21 13:10:26 +03:00
qazal
f9655af2a3
viz/cli: move to tinygrad ( #15835 )
...
* move cli
* update imports
* cleanup the readme
* edit
* work
* details
* python -m tinygrad.viz.cli
* do not execv in non tty
* option
* lint
* simpler
* gemm pmc
2026-04-21 13:35:10 +09:00
qazal
601b9d3f59
viz/cli: dedup DEBUG=3 pyrender ( #15826 )
2026-04-20 19:29:09 +09:00
qazal
b05b1010bf
viz/cli: ux cleanups, show user python ( #15817 )
...
* small fixes
* print python trace
* jsonl
* cleanup fmt, fix tqdm
* print mode
* types
* less
* keep those
* fix
* everyone can print json
* pmc p2
2026-04-20 03:50:48 +03:00
qazal
c6d8753ee1
viz/cli: --json support, refine docs ( #15528 )
...
* refine
* remove
* refine
* keep
* need to say this
* back
* feedback
* feedback
* json
* dur_ms
* et_ms
* remove useless thing
* docs
* respect NO_COLOR
* DEBUG also produces valid json
2026-04-19 21:53:38 +03:00
wozeparrot
f28ea84de2
llama: fused silu fp8 amax ( #15798 )
...
* llama: combined w13
* llama: fused swiglu+fp8
* llama: fix amax interleaving
* llama: don't need seperate matmul
2026-04-19 12:03:55 +08:00
nimlgen
022d8c4a11
remove jit_cache usage in extra/examples ( #15808 )
...
* remove jit_cache usage in extra/examples
* cached
2026-04-18 23:00:18 +03:00
qazal
2581985532
viz/cli: multi device profiler output, print markers ( #15795 )
...
* yield
* all devices
* better
* add unittests
* markers like this
* profile_markers work
* less
* update README
* tiny and null
2026-04-17 23:40:10 +03:00
qazal
a227dbece1
viz/cli: reconstruct DEBUG output ( #15791 )
...
* work
* work
* ext
* padding
* at time
* work
* reorder
* less flags
* num_rows
* feedback
* pmc
2026-04-17 18:27:58 +03:00
qazal
afc3904e58
viz/cli: unit tests in CI ( #15788 )
...
* simple failing test
* test stdout
* cleanup sqttmap
2026-04-17 22:34:44 +09:00
qazal
7bdb3adbbf
viz/cli: simplification and reordering ( #15785 )
...
* remove
* work
* this is all one thing
* the reorder
2026-04-17 15:16:07 +03:00
wozeparrot
9e60e4a7e7
llama: native fp8 ( #15733 )
2026-04-16 22:16:05 -07:00
qazal
0e69388f6b
viz/cli: add DEBUG, optional number of rows ( #15777 )
...
* tabulate switch
* support DEBUG
* --top
* improve
* work
* feedback
* 0
* print_kernel both ways
* simplify
2026-04-17 04:36:47 +03:00
qazal
6d9320ffb3
add NO_COLOR ( #15765 )
...
* NO_COLOR in cli
* add in helpers
* rm flags
* docs
* fix that
* temp
* Revert "temp"
This reverts commit 7522e664f6 .
2026-04-16 22:44:55 +03:00
qazal
12c653a743
remove opts arg in get_program, everything uses opts_to_apply [pr] ( #15767 )
...
* check Ops.BEAM in process replay
* remove opts from the get_program api
* lint
* simplify
* cleanup
2026-04-16 22:42:43 +03:00
qazal
126cda45f8
viz/cli: cleanups, add memory printer ( #15762 )
...
* simple repro
* use context
* work
* memory printer
* rm
* memory printer
* pylint
2026-04-16 22:44:47 +09:00
George Hotz
d1cce7a476
put the ranges on store instead of after ( #15759 )
...
* put the ranges on store instead of after
* better assert
* fix stuff
* comment out slow rules i don't understand
* simpler rule
* closer
* return false for store
* fix loop
* only a few schedule failures remain
* remove stores to self
* all tests pass locally
* remove junk
* regression test and fix
* better test, bump broken torch count
* bugfix with regression test
* new fusion is better
2026-04-16 19:06:40 +08:00
qazal
1f26584b2e
viz/cli: cleanups from linter ( #15745 )
...
* run linter
* pmc
2026-04-16 03:36:24 +09:00
chenyu
3394d18066
size*itemsize -> nbytes ( #15729 )
...
and some UOp.size removal to prep for size to mixin change
2026-04-14 16:27:54 -04:00
qazal
905b8adc97
viz: cli and server cleanups ( #15713 )
...
* update get_profile arg[0]
* uop_to_json arg[0]
* data is standalone in cli
2026-04-14 06:42:29 +09:00
George Hotz
16f50a40a5
remove REMU from tree ( #15706 )
...
* no more compare emulators
* remove remu from tree
2026-04-13 20:43:08 +08:00
qazal
ac027055ef
viz: no global state ( #15705 )
...
* start viz data
* get_full_rewrites also moves
* update ref_map
* work
* update consumers
* cleaner cli
* linter
* cleanup tests
* back
* better
* sqtt tests
2026-04-13 21:35:20 +09:00
wozeparrot
457508d5a0
llama: save more 2 ( #15681 )
2026-04-11 01:03:36 -07:00
wozeparrot
55bcd7cc9e
llama amax outside ( #15670 )
2026-04-09 23:08:03 -07:00
nimlgen
057dc173ab
beam uop ( #15660 )
...
* beam as uop
* x
2026-04-09 19:13:03 +03:00
George Hotz
48a7627b04
add RDNA4 support to copy WMMA ( #15663 )
...
* add RDNA4 supportt to copy WMMA
* simpler
* simpler
* comment
* assert
2026-04-09 22:48:20 +08:00
qazal
742b3894d7
viz/cli: add pmc printer ( #15651 )
...
* viz/cli: add pmc printer
* cli work
* s
* linter
* pack workgroups
* add : to wgp
* counter name
2026-04-09 08:50:54 +09:00