Commit Graph

13162 Commits

Author SHA1 Message Date
George Hotz
35e13e08d5 improve shapes to make them behave like dtype.count, try 2 2026-05-01 10:27:54 -07:00
nimlgen
1079441332 revoke bus master (#16007) 2026-05-01 18:00:01 +03:00
qazal
8b147a9ed5 minimal repro for llama copies 2 (#16011) 2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b Revert "cleanup: untrack wait Metal buffers (#15954)" (#16010)
* Revert "cleanup: untrack wait Metal buffers (#15954)"

This reverts commit 5eb1fd5d3c.

* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7 metal synchronize regression test (#16008)
* add test for metal wait=True

* add self.assertRaises
2026-05-01 20:10:57 +09:00
nimlgen
f6d92b55e6 am: use per pipe reset for gfx11+ (#16006) 2026-05-01 12:56:43 +03:00
Christopher Milan
cee73becbe am: ip offsets in autogen (#16003) 2026-05-01 00:13:52 -04:00
George Hotz
4506688285 split render to render.py (#16002)
* split render to render.py

* move more print
2026-04-30 19:41:14 -07:00
George Hotz
d651b4bbf0 SPEC=3 checks the shape (#16001)
* SPEC=3 checks the shape

* buffer view

* Revert "buffer view"

This reverts commit ffd87889a9.

* buffer view hack

* fix ptx
2026-04-30 18:41:37 -07:00
wozeparrot
528d35e306 llama speed 4 (#15993) 2026-04-30 17:14:41 -07:00
George Hotz
45fd7a3668 lil_image vectorize (#16000)
* lil_image vectorize

* 0 pitch on height 1

* Revert "0 pitch on height 1"

This reverts commit 58a83e6622.
2026-04-30 16:12:43 -07:00
wozeparrot
eddcd4723b am_smi throttle info (#15997) 2026-04-30 15:28:32 -07:00
chenyu
52c92e15ae no replacement multinomial (#15995)
* no replacement multinomial

Efraimidis–Spirakis

* num_samples == 1 can use fast path
2026-04-30 17:35:26 -04:00
chenyu
e0b09f288f input validation for rand functions (#15990) 2026-04-30 14:00:44 -04:00
nimlgen
11e1a2b89f cleaner and faster run_linear (#15987)
* cleaner and faster run_linear

* x

* assert for now

* x

* x

* sym_infer

* remove sink
2026-04-30 20:15:22 +03:00
qazal
58b34e71bd failing test for llama useless copies (#15989) 2026-05-01 00:55:29 +09:00
George Hotz
0f7e296f5b fix some indexing edge cases (#15988) 2026-04-30 08:05:30 -07:00
nimlgen
6f8b10d251 remove base Runner (#15986)
* remove base Runner

* linters
2026-04-30 13:04:55 +03:00
George Hotz
46a36a838a small dtype shapes fixups (#15984) 2026-04-29 19:40:38 -07:00
chenyu
b73248958a minor rand cleanups (#15982) 2026-04-29 22:22:29 -04:00
chenyu
53a28bafbd rand device seed to its own function (#15979) 2026-04-29 17:21:40 -04:00
Christopher Milan
d07741f1d7 am: look for firmware in /lib/firmware/amdgpu (#15974) 2026-04-29 17:15:09 -04:00
nimlgen
c73e667fc0 remove if for precompiled programs (#15980) 2026-04-29 23:43:36 +03:00
qazal
55915584e5 viz: fix cfg for emulated amd on the null device (#15976)
* simple failing when i test it end to end

* pass

* linter

* assemble
2026-04-30 05:18:09 +09:00
nimlgen
dfd2d07005 remove CompiledRunner (#15970)
* rm usage of CompiledRunner

* more tests

* last

* linter

* sink

* remove

* linter
2026-04-29 22:45:48 +03:00
wozeparrot
0080489abe llama: use env vars (#15978) 2026-04-29 12:37:15 -07:00
qazal
a37b605523 remove arch from asm kernel class (#15977)
* rm arch from kernel

* update other tests

* update abstractions4.py
2026-04-30 03:39:52 +09:00
Christopher Milan
7a79c2948a DEV visible device filter supports hyphenated syntax (#15971) 2026-04-29 14:02:21 -04:00
Christopher Milan
6b9a45568c autogen: better version handling for llvm and libclang (#15975) 2026-04-29 14:01:33 -04:00
chenyu
654e611a29 _bits_to_rand to mixin (#15972) 2026-04-29 13:47:25 -04:00
George Hotz
5f441ecffc unify reduce + reduce_axis (#15973)
* unify reduce + reduce_axis

* fix all tests

* lil cleanups
2026-04-29 10:29:56 -07:00
qazal
b63e0a5f74 viz/sqtt: move amd decoder to extra, don't import from ops_amd (#15969)
* don't import from ops_amd

* start

* cleanup
2026-04-30 00:49:15 +09:00
nimlgen
7787f76dcc get_runner -> get_runtime (#15967)
* get_runner -> get_runtime

* do not use get_runner

* fix

* remove get_tunner

* remove

* fix

* x
2026-04-29 18:29:49 +03:00
chenyu
fb188c3c23 UOp.bitcast noop early return (#15968)
matches Tensor
2026-04-29 09:41:40 -04:00
qazal
30403c1e25 viz/cli: merge DEBUG=6 and -i (#15966)
* print_step contiguous

* merge
2026-04-29 19:52:17 +09:00
qazal
86621e9e7c gate f32_to_fp8 renderer (#15964) 2026-04-29 19:12:46 +09:00
wozeparrot
ef09071073 llama: speed 2 (#15960) 2026-04-28 20:44:37 -07:00
Christopher Milan
e6863a1cc5 autogen: fewer type: ignores (#15956) 2026-04-28 21:58:13 -04:00
chenyu
836af56513 some RandMixin cleanup (#15961)
cleaner to just put inside OpMixin
2026-04-28 19:58:02 -04:00
chenyu
c4bea54e9c _threefry_random_bits to mixin (#15959)
start RandMixin
2026-04-28 19:13:57 -04:00
George Hotz
796fdf9fd8 end has no shape (#15958) 2026-04-28 15:15:48 -07:00
Miguel Villa Floran
b36010c55a DGX Spark and Jetson Thor support (#15939) 2026-04-28 18:08:21 -04:00
Nino Risteski
5eb1fd5d3c cleanup: untrack wait Metal buffers (#15954) 2026-04-28 12:54:59 -07:00
nimlgen
77965a22e5 local optimize as rewrite (#15953)
* local optimize as rewrite

* better

* x

* slighly rename

* fix

* ugh

* remove

* x

* remove

* not weak
2026-04-28 22:51:04 +03:00
qazal
b3f0f8d349 llama: fix missing label_smoothing arg (#15955) 2026-04-29 02:12:14 +09:00
wozeparrot
5e861cd2c4 llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00
Christopher Milan
987b6dd193 python -m tinygrad.device prints interface info (#15950) 2026-04-27 22:15:38 -04:00
qazal
54f00e1013 sqtt: correct rdna4 structs (#15948) 2026-04-28 07:35:50 +09:00
Charlie Kerfoot
890d7be0c3 fix: muon not using device (#15936) 2026-04-27 14:56:48 -07:00
qazal
c58fd85a99 sqtt: add needs_rocprof decorator (#15947)
* sqtt: add needs_rocprof decorator

* version string
2026-04-28 06:22:50 +09:00