Commit Graph

1221 Commits

Author SHA1 Message Date
chenyu
b38fc43b07 assert assign dtype mismatch for disk [pr] (#14473)
the disk hack is generally wrong, now force bitcast on the source before assign
2026-01-31 17:08:54 -05:00
chenyu
ced886f26c failed test case for assign into bitcast (#14469)
* failed test case for assign into bitcast

DISK assign has custom hack for this. need to fix before we can unify assign

* test_assign_bitcast_different_size
2026-01-31 14:26:47 -05:00
chenyu
c765641215 remove unused allow_any_len [pr] (#14464)
STORE has 2 src, RESHAPE has 2 src, BUFFER has 2 src
added some tests for the untested allow_any_len
2026-01-31 11:05:42 -05:00
chenyu
b4f5a51ebb move tests to unit (#14463)
test_uop_graph does not need device, test_memory_planner can use NULL
2026-01-31 10:49:31 -05:00
chenyu
99b44121bc failed test case for non-consecutive disk read (#14455)
silently fail now
2026-01-30 23:44:04 -05:00
chenyu
03613e83ad update TestTensorMetadata (#14443)
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
chenyu
26f5c00265 move TestTensorMetadata to unit (#14442) 2026-01-30 12:14:21 -05:00
George Hotz
838cd078bc use atomics for embedding backward (#14400)
* embedding is slow

* failing

* float is fine

* null

* it fails

* simplify embedding with broadcasting

* ATOMIC_ADD incoming

* min change

* simpler test

* better test

* fix test

* real test

* simpler

* cleanups

* types and names

* _zero_kernel

* grad multi

* hack

* none

* multi unshard

* more for call

* don't tag in call

* good

* call_multi

* call_multi wow claude is useless

* embedding backward mutli test

* test passes

* fix as_param

* shape_to_shape_arg

* add clip

* before cast

* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
George Hotz
7a9dee4e50 add call/param UOps (#14433)
* add call/param UOps

* resolve call

* skip that for now

* grad on call

* fix tests
2026-01-30 14:51:45 +08:00
chenyu
86a204d22a allow Tensor setitem input to be list/tuple (#14432)
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
ddc041854b failed test case for disk setitem (#14426)
strided setitem is wrong
2026-01-29 14:54:19 -05:00
Christopher Milan
5e36482314 decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
George Hotz
065b95cfb0 Revert "add retry to fetch (#14370)" (#14385)
This reverts commit dc4d7f2d55.
2026-01-28 09:35:37 +08:00
Eitan Turok
dc4d7f2d55 add retry to fetch (#14370) 2026-01-27 14:04:25 -08:00
Christopher Milan
289a3e415e also skip test_nonoverlapping_shrink_assignment (#14382) 2026-01-27 16:26:26 -05:00
chenyu
db010a31be IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
chenyu
c22667b0c4 also skip test_overlapping_shrink_assignment_reverse (#14375)
crashing
2026-01-27 12:20:39 -05:00
George Hotz
0ced258726 HOTFIX: skip crashing assign test 2026-01-27 20:35:17 +08:00
imaolo
14574c68fa Add ContextVar to disable the scheduler cache (#14257)
* add scheduler cache ContextVar

* test scheduler cache context var

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-01-27 19:55:29 +08:00
Christopher Milan
2e72625652 Revert "decompose dtypes.long to ints where unsupported (#14261)" (#14362) 2026-01-27 02:04:59 -05:00
Christopher Milan
0793319929 decompose dtypes.long to ints where unsupported (#14261)
* add works

* use carry not overflow

* bitwise ops

* use tag instead of vec

* cleaner

* mul somewhat works

* mul actually works

* SUB and NEG work

* SHL/SHR

* ulong support

* this should work?

* oops

* fix indexing

* all ALU mostly works

* refactor

* test_dtype passing

* signed division works

* format

* clean

* some tests

* ruff
2026-01-26 18:34:13 -05:00
chenyu
d641e63189 improve min/max for AND (#14356) 2026-01-26 15:44:18 -05:00
chenyu
f16372487a fix assign hazard on shrink (#14355)
* fix assign hazard on shrink

possible to have race if both assign src and dest are shrink

* test_nonoverlapping_shrink_assignment
2026-01-26 14:46:30 -05:00
chenyu
823bc17fb5 failed test case for shrink overlap assigns (#14350)
* failed test case for shrink overlap assigns

current logic can create a race resulted in wrong output

* skip for now
2026-01-26 11:58:45 -05:00
George Hotz
984cdc4840 add wrapper class for the -0.0 != 0.0 issue (#14339)
* add wrapper class for the -0.0 != 0.0 issue

* fixes

* spec fix

* missed one
2026-01-26 16:52:37 +08:00
George Hotz
cc49e47ea2 tinygrad changes from ucode (#14336)
* tinygrad changes from ucode

* dtype
2026-01-26 11:30:18 +08:00
chenyu
cb69b7b2b2 comment out fold_where_closure (#14316) 2026-01-24 10:15:42 -05:00
chenyu
e65bc7a7c5 where closure folding (#14304) 2026-01-23 10:55:13 -05:00
chenyu
5f32f7a06b fix winograd padding order (#14294) 2026-01-22 23:00:14 -05:00
chenyu
6279ae4a94 remove llm generate always reset start_pos (#14276)
* remove llm generate always reset start_pos

by itself seems like a bug, also added a test to repro forward_jit.reset() issue

* issue is jit graph, so revert that test
2026-01-21 16:54:30 -05:00
chenyu
e64111ad08 update all_same [pr] (#14270)
add type annotation and unit test
2026-01-21 11:26:15 -05:00
George Hotz
5e24643889 minor import speedups (#14244)
* minor import speedups

* server stuff in server places

* pre-commit

* fix
2026-01-20 15:05:36 +09:00
qazal
b1c5a242b7 Revert "move is_dtype_supported logic to renderer (#14188)" (#14237)
This reverts commit 161fee9a48.
2026-01-20 12:19:14 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
chenyu
e7c2df9113 improve consecutive Tensor indexing (#14208)
* improve consecutive Tensor indexing

instead of O(idx_counts*src_dims), it can just be O(idx_counts)

* test correctness
2026-01-18 15:14:33 -05:00
chenyu
c7b8f6496f remove dtypes.index_like and dtypes.fields [pr] (#14207)
barely used, so just use inline and DTYPES_DICT
2026-01-18 11:49:01 -05:00
Christopher Milan
a021b84604 autogen: fix enum (#14171) 2026-01-16 01:30:11 -05:00
chenyu
14e9a71a41 move test_assign to unit (#14165)
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
Christopher Milan
0cb024a5bb remove ctypes.Structure (#13651) 2026-01-15 05:06:22 -05:00
qazal
164bc678a6 scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules (#14161)
* simplest failing test

* min fix

* same function reuses the cache

* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
chenyu
35c9701df0 update outdated tests and comments (#14090) 2026-01-10 01:00:48 -05:00
chenyu
92246ea731 update tests, WEBGPU=1 pytest . passes (#14089)
* update tests, `WEBGPU=1 pytest .` passes

* minor update
2026-01-10 00:03:02 -05:00
chenyu
eacccc5ace more disk assign tests (#14087)
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc don't skip gguf test if ggml is not installed (#14086)
* don't skip gguf test if ggml is not installed

should just let it fail

* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78 add some disk assign tests (#14085) 2026-01-09 11:50:59 -05:00
Garret Castro
16b652302e skip bf16 test if not supported by device (#14070) 2026-01-08 13:37:24 -05:00
Christopher Milan
b2a0b9c551 autogen: dump patch in CI (#14010)
* autogen: don't fast-fail, produce patch artifact on differences

All verification steps now use continue-on-error to run completely.
Each job generates a patch artifact containing all differences found.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* add gen from header test

* fix tests

* fail if diff

* add forward decl autogen test

* remove confusing/wrong comments

* macos unittests set LIBCLANG_PATH

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 22:38:12 -05:00
chenyu
cfb8bf5814 faster image load (#13977)
sometimes image load does not need to init with NAN
2026-01-04 13:09:59 -05:00
chenyu
8003db2a28 test case of NOOP store load folding (#13997) 2026-01-03 14:39:26 -05:00
chenyu
2e2b5fed12 fix misspellings (#13976) 2026-01-02 10:37:38 -05:00