Commit Graph

13143 Commits

Author SHA1 Message Date
George Hotz
5b8ff231cd Merge branch 'master' into remove_vec_3 2026-04-29 13:31:51 -07:00
qazal
55915584e5 viz: fix cfg for emulated amd on the null device (#15976)
* simple failing when i test it end to end

* pass

* linter

* assemble
2026-04-30 05:18:09 +09:00
nimlgen
dfd2d07005 remove CompiledRunner (#15970)
* rm usage of CompiledRunner

* more tests

* last

* linter

* sink

* remove

* linter
2026-04-29 22:45:48 +03:00
wozeparrot
0080489abe llama: use env vars (#15978) 2026-04-29 12:37:15 -07:00
qazal
a37b605523 remove arch from asm kernel class (#15977)
* rm arch from kernel

* update other tests

* update abstractions4.py
2026-04-30 03:39:52 +09:00
Christopher Milan
7a79c2948a DEV visible device filter supports hyphenated syntax (#15971) 2026-04-29 14:02:21 -04:00
Christopher Milan
6b9a45568c autogen: better version handling for llvm and libclang (#15975) 2026-04-29 14:01:33 -04:00
chenyu
654e611a29 _bits_to_rand to mixin (#15972) 2026-04-29 13:47:25 -04:00
George Hotz
a448b47e9c Merge branch 'master' into remove_vec_3 2026-04-29 10:30:18 -07:00
George Hotz
5f441ecffc unify reduce + reduce_axis (#15973)
* unify reduce + reduce_axis

* fix all tests

* lil cleanups
2026-04-29 10:29:56 -07:00
qazal
b63e0a5f74 viz/sqtt: move amd decoder to extra, don't import from ops_amd (#15969)
* don't import from ops_amd

* start

* cleanup
2026-04-30 00:49:15 +09:00
nimlgen
7787f76dcc get_runner -> get_runtime (#15967)
* get_runner -> get_runtime

* do not use get_runner

* fix

* remove get_tunner

* remove

* fix

* x
2026-04-29 18:29:49 +03:00
chenyu
fb188c3c23 UOp.bitcast noop early return (#15968)
matches Tensor
2026-04-29 09:41:40 -04:00
qazal
30403c1e25 viz/cli: merge DEBUG=6 and -i (#15966)
* print_step contiguous

* merge
2026-04-29 19:52:17 +09:00
qazal
86621e9e7c gate f32_to_fp8 renderer (#15964) 2026-04-29 19:12:46 +09:00
wozeparrot
ef09071073 llama: speed 2 (#15960) 2026-04-28 20:44:37 -07:00
Christopher Milan
e6863a1cc5 autogen: fewer type: ignores (#15956) 2026-04-28 21:58:13 -04:00
George Hotz
6a983bb72b fixes, reduce is broken 2026-04-28 18:54:15 -07:00
George Hotz
fedad1681f remove dtype vec, try 3 2026-04-28 18:34:31 -07:00
chenyu
836af56513 some RandMixin cleanup (#15961)
cleaner to just put inside OpMixin
2026-04-28 19:58:02 -04:00
chenyu
c4bea54e9c _threefry_random_bits to mixin (#15959)
start RandMixin
2026-04-28 19:13:57 -04:00
George Hotz
796fdf9fd8 end has no shape (#15958) 2026-04-28 15:15:48 -07:00
Miguel Villa Floran
b36010c55a DGX Spark and Jetson Thor support (#15939) 2026-04-28 18:08:21 -04:00
Nino Risteski
5eb1fd5d3c cleanup: untrack wait Metal buffers (#15954) 2026-04-28 12:54:59 -07:00
nimlgen
77965a22e5 local optimize as rewrite (#15953)
* local optimize as rewrite

* better

* x

* slighly rename

* fix

* ugh

* remove

* x

* remove

* not weak
2026-04-28 22:51:04 +03:00
qazal
b3f0f8d349 llama: fix missing label_smoothing arg (#15955) 2026-04-29 02:12:14 +09:00
wozeparrot
5e861cd2c4 llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00
Christopher Milan
987b6dd193 python -m tinygrad.device prints interface info (#15950) 2026-04-27 22:15:38 -04:00
qazal
54f00e1013 sqtt: correct rdna4 structs (#15948) 2026-04-28 07:35:50 +09:00
Charlie Kerfoot
890d7be0c3 fix: muon not using device (#15936) 2026-04-27 14:56:48 -07:00
qazal
c58fd85a99 sqtt: add needs_rocprof decorator (#15947)
* sqtt: add needs_rocprof decorator

* version string
2026-04-28 06:22:50 +09:00
Christopher Milan
3f508810d8 cpu: lowercase arch (#15943) 2026-04-27 17:05:25 -04:00
chenyu
77f9125c21 move Tensor.pad to OpMixin (#15946) 2026-04-27 16:56:04 -04:00
nimlgen
4164666c72 programinfo (#15942)
* programinfo

* fix

* m

* x

* x

* changes

* x

* fix

* rm
2026-04-27 23:12:03 +03:00
chenyu
fe38d6de94 _pad_circular and _pad_reflect_replicate to mixin (#15944) 2026-04-27 16:07:05 -04:00
qazal
8c174bdad4 viz/sqtt: correct exec pipes (#15885)
* wmma

* p2

* test

* left

* work

* pickle

* handwritten failing tests

* start work

* test the pipes

* empirical evidence

* update rdna4 enum types

* VALU pipe 1

* TRANSCENDENTAL pipe

* transcendental function units

* reorder

* wmma pipe

* cleanup and notes

* smaller

* work

* diff cleanup

* pickle

* use se:1

* int
2026-04-28 05:05:49 +09:00
qazal
eeb8d5eb0c viz: small ui changes (#15940)
* rename colors

* keep ctrl c
2026-04-27 04:00:13 +09:00
nimlgen
96165ff0d1 validate_with_cpu as rewrite (#15938)
* validate_with_cpu as rewrite

* compil

* x

* linter

* moved

* fix
2026-04-26 19:58:53 +03:00
nimlgen
117e9e22dd estimates from graph (#15937)
* estimates from graph

* test

* x
2026-04-26 18:22:53 +03:00
chenyu
e9983e3516 remove unused QCOMTextureInfo, QueueType [pr] (#15935) 2026-04-25 14:32:31 -04:00
nimlgen
ac3494a7cc remove some runners (#15934)
* remove runners

* mypy
2026-04-25 21:27:05 +03:00
nimlgen
bb652352c7 remove execitem (#15932)
* remove execitem

* f

* x
2026-04-25 19:33:04 +03:00
chenyu
e27444a0ff remove unused UOp.shard_size [pr] (#15933) 2026-04-25 12:27:58 -04:00
nimlgen
e0ff6cc15c remove old schedule (#15930)
* remove old schedule

* tests

* r

* x
2026-04-25 16:46:36 +03:00
qazal
9a23de7d27 viz/cli: unify profile and rewrites, -s ALL default (#15931)
* work

* workg

* better

* cleanup

* better defaults

* --ls

* better

* work

* update llama

* update
2026-04-25 22:31:24 +09:00
nimlgen
768106a542 remove schedule from extra/docs/examples (#15929)
* remove schedule from extra/docs/examples

* f
2026-04-25 14:09:12 +03:00
nimlgen
a5e9ea7a60 remove schedule batch 4 (#15927)
* remove schedule batch 4

* fini
2026-04-25 12:36:55 +03:00
nimlgen
d2ab6ea7a6 remove schedule batch 3 (#15924)
* remove shcedule batch 3

* batch 6

* batch 7
2026-04-25 11:53:16 +03:00
nimlgen
3c8a2db870 remove schedule() from tests batch 2 (#15923)
* remove schedule() from tests batch 2

* batch 4
2026-04-25 10:44:41 +03:00
Denys Melnyk
1fdcb13bfb webgpu: fix weight lookup in export_model after compile_net key change (#15919)
* fix lookup site in export_model_webgpu after refactoring

webgpu (sd): fix export_model weight lookup after compile_net changes

fix lookup site in export_model_webgpu after refactoring

* add regression test
2026-04-25 10:04:55 +03:00