Commit Graph

189 Commits

Author SHA1 Message Date
George Hotz
a120709671 tighten shape spec for broadcasting (#16206)
* tighten shape spec for broadcasting

* use IndexError, not ValueError

* needs size
2026-05-18 22:12:04 -07:00
George Hotz
3f2d401464 all tests pass with NOOPT=1 (#16257)
* all tests pass with NOOPT=1

* fix a few more

* noopt 100% pass

* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
e694d7f222 more deviceless const prerequisites [pr] (#16256)
* more deviceless const prerequisites [pr]

* remove that

* arange.contiguous -> arange.clone in tests

arange will become deviceless const soon, update tests where it needs to be a buffer
2026-05-18 23:14:12 -04:00
chenyu
c1076ed56c Tensor.device and UOp.device can be None (#16255) 2026-05-18 22:08:10 -04:00
chenyu
d532b4f533 multi alu with deviceless const (#16251) 2026-05-18 19:31:53 -04:00
Christopher Milan
7515824a6d ci: actually use clang-20, enable bfloat16 (#16249) 2026-05-18 19:06:43 -04:00
chenyu
73e6b4963b to and shard is noop for deviceless uop (#16247) 2026-05-18 16:11:10 -04:00
chenyu
db639ebe3e deviceless const from UOp (#16243) 2026-05-18 14:14:12 -04:00
chenyu
5ae4dbd599 make slow tests faster (#16244) 2026-05-18 11:42:02 -04:00
chenyu
8631b6f17d remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050 do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
qazal
ebcb7b7cc0 fp8 gemm tests with scale args (#16231)
* update atol

* update fp8 path

* more work

* update profile.sh
2026-05-16 20:47:58 +09:00
wozeparrot
2d48d7ab09 remove more invalid (#16227) 2026-05-16 02:52:27 -07:00
chenyu
d62c1d83c0 remove Tensor.eye override (#16219)
* remove Tensor.eye override

was only needed for requires_grad arg

* README
2026-05-15 15:40:34 -04:00
chenyu
07a172dbbb remove noop requires_grad_ calls (#16213) 2026-05-15 13:31:10 -04:00
chenyu
409bb0c9ad requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
chenyu
a612b88abb better assert when setitem a refed tensor (#16210)
also decouple from requires_grad
2026-05-14 23:40:29 -04:00
chenyu
a75c14f010 some setitem tests (#16209) 2026-05-14 22:36:25 -04:00
C T
1b779a9058 add gelu approximate="none" (match pytorch) (#16162)
* add gelu approximate="none" (match pytorch)

* lint

* pass through onnx Gelu approximate

* type annotate

* explicit math.sqrt

* keep tinygrad's gelu approximate="tanh" default
2026-05-13 18:53:24 -07:00
chenyu
bdcdf1f1a1 jittable masked_select and nonzero (#16170)
* jittable masked_select and nonzero

make jittable with `size=`, matches jax

* COMPILE_ONLY
2026-05-12 16:39:36 -04:00
chenyu
7c3e3fa154 fix empty input for masked_select and nonzero (#16168) 2026-05-12 15:36:51 -04:00
George Hotz
64c81dfd24 add all codegen stages to spec_tensor (#16163) 2026-05-12 10:35:38 -07:00
chenyu
f3e3c3851f explicit args to Tensor.rand (#16161)
added requires_grad, other kwargs were silently dropped
2026-05-12 12:53:39 -04:00
nimlgen
e5729935c6 time_call (#16152)
* time_call

* x

* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a add Ops.SOURCE test (#16155)
* simple failing test

* raises

* change
2026-05-12 22:49:32 +09:00
chenyu
09fd80fba6 fix randperm and _multi_like drop requires_grad (#16150) 2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7 Update the spec in spec.py to match the current state (#16132)
* start work on specv2

* more spec

* more spec

* fix amd emulator

* more spec

* more

* fix test_uop_graph

* move those

* spec=2

* skip those questionable tests

* ptx fix

* more spec=2

* store

* allow custom function in tensor

* spec 2

* fix beam search for tensor cores

* delete the old specs

* fix import
2026-05-11 20:07:47 -07:00
chenyu
0b02fb6797 Revert "[pr] match torch rmsnorm (#16122)" (#16144)
This reverts commit 692257dd70.
2026-05-11 17:53:42 -04:00
Joshua James Venter
692257dd70 [pr] match torch rmsnorm (#16122)
* [pr] match rmsnorm torch

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>

* 1e-5

* ops.md

---------

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-05-11 14:36:41 -04:00
Pawan
4dd6ad3514 gradient: add TRUNC backward (#15925)
* gradient: add TRUNC backward

* test: move round quantization gradient to test_ops
2026-05-08 16:27:55 -07:00
chenyu
235044c9d8 Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD (#16093)
* Ops.IDIV -> Ops.CDIV, Ops.MOD -> Ops.CMOD

* ruff
2026-05-07 23:18:15 -04:00
chenyu
072db9924c div to mixin (#16078)
also deleted idiv method
2026-05-07 12:52:37 -04:00
bigyoshi
4024d8438f runtime/graph: avoid core_id runtimevar merge conflicts (#16026)
Co-authored-by: bigyoshi51 <269989564+bigyoshi51@users.noreply.github.com>
2026-05-03 19:16:02 +03:00
chenyu
782d1ff80f Tensor.fmod (#16014)
c-style mod matches torch
2026-05-01 16:02:18 -04:00
qazal
8b147a9ed5 minimal repro for llama copies 2 (#16011) 2026-05-01 22:23:47 +09:00
qazal
a29dd7b19b Revert "cleanup: untrack wait Metal buffers (#15954)" (#16010)
* Revert "cleanup: untrack wait Metal buffers (#15954)"

This reverts commit 5eb1fd5d3c.

* regression test fixes
2026-05-01 21:18:19 +09:00
qazal
65879fe1b7 metal synchronize regression test (#16008)
* add test for metal wait=True

* add self.assertRaises
2026-05-01 20:10:57 +09:00
George Hotz
4506688285 split render to render.py (#16002)
* split render to render.py

* move more print
2026-04-30 19:41:14 -07:00
chenyu
52c92e15ae no replacement multinomial (#15995)
* no replacement multinomial

Efraimidis–Spirakis

* num_samples == 1 can use fast path
2026-04-30 17:35:26 -04:00
chenyu
e0b09f288f input validation for rand functions (#15990) 2026-04-30 14:00:44 -04:00
nimlgen
11e1a2b89f cleaner and faster run_linear (#15987)
* cleaner and faster run_linear

* x

* assert for now

* x

* x

* sym_infer

* remove sink
2026-04-30 20:15:22 +03:00
qazal
58b34e71bd failing test for llama useless copies (#15989) 2026-05-01 00:55:29 +09:00
nimlgen
dfd2d07005 remove CompiledRunner (#15970)
* rm usage of CompiledRunner

* more tests

* last

* linter

* sink

* remove

* linter
2026-04-29 22:45:48 +03:00
George Hotz
5f441ecffc unify reduce + reduce_axis (#15973)
* unify reduce + reduce_axis

* fix all tests

* lil cleanups
2026-04-29 10:29:56 -07:00
nimlgen
7787f76dcc get_runner -> get_runtime (#15967)
* get_runner -> get_runtime

* do not use get_runner

* fix

* remove get_tunner

* remove

* fix

* x
2026-04-29 18:29:49 +03:00
nimlgen
77965a22e5 local optimize as rewrite (#15953)
* local optimize as rewrite

* better

* x

* slighly rename

* fix

* ugh

* remove

* x

* remove

* not weak
2026-04-28 22:51:04 +03:00
nimlgen
4164666c72 programinfo (#15942)
* programinfo

* fix

* m

* x

* x

* changes

* x

* fix

* rm
2026-04-27 23:12:03 +03:00
nimlgen
96165ff0d1 validate_with_cpu as rewrite (#15938)
* validate_with_cpu as rewrite

* compil

* x

* linter

* moved

* fix
2026-04-26 19:58:53 +03:00
nimlgen
117e9e22dd estimates from graph (#15937)
* estimates from graph

* test

* x
2026-04-26 18:22:53 +03:00
nimlgen
e0ff6cc15c remove old schedule (#15930)
* remove old schedule

* tests

* r

* x
2026-04-25 16:46:36 +03:00