Commit Graph

1198 Commits

Author SHA1 Message Date
George Hotz
3f2d401464 all tests pass with NOOPT=1 (#16257)
* all tests pass with NOOPT=1

* fix a few more

* noopt 100% pass

* noopt 100% pass
2026-05-18 20:39:51 -07:00
chenyu
754344087a assign for deviceless const source (#16248) 2026-05-18 17:39:53 -04:00
chenyu
dcee90aa3f remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
chenyu
8631b6f17d remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050 do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
chenyu
07a172dbbb remove noop requires_grad_ calls (#16213) 2026-05-15 13:31:10 -04:00
chenyu
c6cf9e8f0c remove test_svd_nonfull_5_5 (#16217)
flaky, kinda overlap with test_svd_general
2026-05-15 13:10:02 -04:00
chenyu
409bb0c9ad requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
chenyu
a75c14f010 some setitem tests (#16209) 2026-05-14 22:36:25 -04:00
chenyu
ffa1aac7b1 gradient for STORE/AFTER ala clone (#16205) 2026-05-14 20:17:27 -04:00
chenyu
09096ea565 test_gradient_through_clone (#16203)
backward through clone crashes now
2026-05-14 19:26:47 -04:00
b1tg
3c806ff406 clean up gguf (#16160) 2026-05-12 21:16:10 -07:00
chenyu
38d407fd58 simplify svd more (#16181)
all the slowness is scheduling
2026-05-12 23:48:22 -04:00
chenyu
2172363be5 don't use Tensor indexing in svd (#16174)
prepare mixin, also about 4X faster for 8x8 input
2026-05-12 21:56:19 -04:00
wozeparrot
a613bcfc6d allow after on contiguous in spec (#16169)
* feat: allow after on contiguous

* feat: add test
2026-05-12 13:11:44 -07:00
chenyu
da3b7e89a4 atol in test_custom_kernel_multi_output_backward_interacting (#16166) 2026-05-12 14:42:12 -04:00
George Hotz
8294d105a7 Update the spec in spec.py to match the current state (#16132)
* start work on specv2

* more spec

* more spec

* fix amd emulator

* more spec

* more

* fix test_uop_graph

* move those

* spec=2

* skip those questionable tests

* ptx fix

* more spec=2

* store

* allow custom function in tensor

* spec 2

* fix beam search for tensor cores

* delete the old specs

* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66 fix wrong kwargs passed into rands (#16149)
working towards explicit args for these
2026-05-11 22:22:06 -04:00
chenyu
63c1f00b80 disable test_svd_general again (#16146)
flaky on CI
2026-05-11 19:24:32 -04:00
chenyu
fbe8be0b8b style cleanup to Tensor.qr and svd (#16142)
* style cleanup to Tensor.qr and svd

same kernels

* more

* enable
2026-05-11 17:16:59 -04:00
wozeparrot
4d1a9dca41 fix: don't copy precompiled custom kernel outputs (#16084) 2026-05-07 14:02:38 -07:00
nimlgen
5fa0016ffc supports_exec_item -> supports_uop (#16033) 2026-05-05 22:41:13 +03:00
wozeparrot
419d525553 feat: handle multioutput kernel grads (#16028) 2026-05-02 22:31:45 -07:00
George Hotz
5f441ecffc unify reduce + reduce_axis (#15973)
* unify reduce + reduce_axis

* fix all tests

* lil cleanups
2026-04-29 10:29:56 -07:00
nimlgen
4164666c72 programinfo (#15942)
* programinfo

* fix

* m

* x

* x

* changes

* x

* fix

* rm
2026-04-27 23:12:03 +03:00
nimlgen
96165ff0d1 validate_with_cpu as rewrite (#15938)
* validate_with_cpu as rewrite

* compil

* x

* linter

* moved

* fix
2026-04-26 19:58:53 +03:00
nimlgen
d3378010ee schedule() -> schedule_linear() in tests (batch 1) (#15915)
* schedule_with_vars -> linear_with_vars in tests

* tests batch 1

* batch 2

* estimate_uop

* simpler

* rm
2026-04-24 23:40:53 +03:00
b1tg
af93a677ae llm: glm 4.5 air (#15771)
* llm: glm 4.5 air

* clean

* clean

* remove gguf_size
2026-04-22 22:47:37 +08:00
Christopher Milan
99a0debd62 Device.count() (#15842) 2026-04-21 16:46:38 -04:00
chenyu
9192c93b7e Tensor.invalid -> Tesnor.invalids (#15849)
matches ones and zeros, and to not share name with UOp.invalid
2026-04-21 11:19:51 -04:00
nimlgen
01ac1c8c15 remove all run_schedule from tests (#15846) 2026-04-21 12:02:10 +03:00
Christopher Milan
1a8ba4cbd6 CPU renderers use arch (#15839) 2026-04-20 23:38:29 -04:00
George Hotz
5819c0abed fix gc in gguf (#15820)
* fix gc in gguf

* fix mypy
2026-04-20 10:15:03 +08:00
George Hotz
67ed4c4eb3 move gguf stuff from nn/state.py to llm/gguf.py (#15783)
* move gguf stuff from nn/state.py to llm/gguf.py

* docs
2026-04-20 09:41:43 +08:00
Kartik Vashishta
a1696e8413 objc: fix _classmethods_ dispatch flag (#14854)
* objc: fix _classmethods_ dispatch flag

* test: add objc _classmethods_ regression
2026-04-20 09:35:03 +08:00
chenyu
5bdfd4883f update test_assign (#15809)
clean up old skips and update tests
2026-04-18 21:25:44 -04:00
Christopher Milan
6adf4c3cd9 MOCKGPU interfaces (#15796) 2026-04-17 21:56:29 -04:00
chenyu
8da308573f update test_assign_changes_alt with clone (#15802) 2026-04-17 20:17:37 -04:00
qazal
9f2a578e26 unskip TestCall.test_call_gemm_uop [pr] (#15786) 2026-04-17 16:18:51 +03:00
George Hotz
e1d13bc4fe add GGUF IQ4_XS support (#15766)
* add GGUF IQ4_XS support

* gguf 21

* gguf 21

* use plus

* ggml_common autogen for constant arrays

* fix

* ggml_common in autogen

* inline
2026-04-17 14:43:39 +08:00
George Hotz
a9b6cfece0 refactor llm into files (#15780)
* refactor llm into files

* chat.html

* tokenizer cleanup

* cleanup

* tests
2026-04-17 12:33:11 +08:00
George Hotz
ec00cefa5b llm is the only app (#15779)
* tinygrad/llm is the only app

* upd pyproject

* claude refs

* scoping

* min diff
2026-04-17 10:44:48 +08:00
chenyu
f0c12a2004 another form of assign to itself (#15770) 2026-04-16 15:17:19 -04:00
chenyu
d147e2a549 update test_nested_after_contiguous_store (#15763)
add kernel counts and some TODOs
2026-04-16 09:59:26 -04:00
George Hotz
f57380cbc2 simplify GatedDeltaNetBlock using two state tensors (#15704)
* test double after

* simpler ssm

* no double test
2026-04-16 21:14:00 +08:00
George Hotz
d1cce7a476 put the ranges on store instead of after (#15759)
* put the ranges on store instead of after

* better assert

* fix stuff

* comment out slow rules i don't understand

* simpler rule

* closer

* return false for store

* fix loop

* only a few schedule failures remain

* remove stores to self

* all tests pass locally

* remove junk

* regression test and fix

* better test, bump broken torch count

* bugfix with regression test

* new fusion is better
2026-04-16 19:06:40 +08:00
George Hotz
d24466c844 CALL with return value is FUNCTION (#15758)
* CALL with return value is FUNCTION (GPT try)

* cleanups
2026-04-16 13:25:07 +08:00
chenyu
10c262ced8 update tests that use UOp.size (#15753) 2026-04-15 21:58:27 -04:00
George Hotz
1ae6528bb6 move schedule into schedule (#15736)
* move schedule into schedule

* callify to root

* sched docs
2026-04-15 11:03:25 +08:00
wozeparrot
2b8d303f75 allreduce in precast dtype (#15689) 2026-04-13 20:24:12 -07:00