Commit Graph

5694 Commits

Author SHA1 Message Date
chenyu
fbed36fa15 jit graph handle input==output aliasing (#14287)
a position that wasn't an input during capture should never become an input during execution, but graph cannot tell this by jit_cache and input_buffers only
2026-01-22 11:37:41 -05:00
chenyu
8bb61c2490 stronger test_graph_input_output_aliasing (#14282)
* stronger test_graph_input_output_aliasing

* comfirmed failure
2026-01-22 09:59:34 -05:00
chenyu
4de107b764 jit graph bug when input is output (#14278)
* jit graph bug when input is output

wrong result in llm

* not just metal
2026-01-21 18:49:52 -05:00
chenyu
6279ae4a94 remove llm generate always reset start_pos (#14276)
* remove llm generate always reset start_pos

by itself seems like a bug, also added a test to repro forward_jit.reset() issue

* issue is jit graph, so revert that test
2026-01-21 16:54:30 -05:00
chenyu
574d171fa6 fix onnx Pad constant_value=None (#14271)
also removed a dead branch in _resolve_pool_pads
2026-01-21 11:51:34 -05:00
chenyu
e64111ad08 update all_same [pr] (#14270)
add type annotation and unit test
2026-01-21 11:26:15 -05:00
chenyu
9ad3c865ac fix bug in logsumexp keepdim=True (#14268) 2026-01-21 09:49:55 -05:00
George Hotz
41d00a046d add device to local, fix PCONTIG=2 (#14266)
* add device to local, fix PCONTIG=2

* regression test

* remove the device when we render

* viz slowness

* no long
2026-01-21 22:12:18 +09:00
nimlgen
22af7132cd fix test_dev_jitter_matrix (#14255) 2026-01-20 20:07:51 +03:00
C T
26f8b12e01 Whisper audio helpers (mel filters in tinygrad) (#13478)
* add whisper audio helpers for stft/mel/resample

* cleanup

* add whisper stft test

* make only stft test explicitly depend on librosa

* extract sinc_window_kernel

* dehardcode device

* use same device argument

* simplify

* type annotate

* ruff format audio_helpers.py

* ruff format test_whisper.py

* add WHISPER_NEW_STFT

* rename

* undo ruff format changes

* use new stft and mel for whisper

* remove stft test that depends on librosa

* remove whitespace

* add Tensor.log10 with test\test_ops.py::TestOps::test_log10

* use Tensor.log10

* fix lint

* future: remove unused STFT class

* future: remove resample code since it isn't used (yet)

* match openai with pad_mode="reflect"

* pad_to

* future: cut resample leftovers

* cleanup

* add mel tests

* future: cut stft

* future: cut non-mel prep_audio changes

* reduce diff

* move audio_helpers.py to examples

* reduce whitespace

* fix imports

* reduce whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-20 10:50:02 -05:00
George Hotz
5e24643889 minor import speedups (#14244)
* minor import speedups

* server stuff in server places

* pre-commit

* fix
2026-01-20 15:05:36 +09:00
qazal
b1c5a242b7 Revert "move is_dtype_supported logic to renderer (#14188)" (#14237)
This reverts commit 161fee9a48.
2026-01-20 12:19:14 +09:00
chenyu
9ea63d7d52 failed test case for onnx IF with jit (#14235)
silently fails now since onnx treats IF cond as a const
2026-01-19 18:10:05 -05:00
George Hotz
31bcbed6bb AMD_DISABLE_SDMA for testing with -n12 (#14216) 2026-01-19 16:10:30 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
chenyu
67d9712ef6 jit copy aliased output if it's read later (#14210) 2026-01-18 18:48:59 -05:00
chenyu
97333b1954 jit footguns test case on assign with same buffer outputs (#14209)
related https://github.com/tinygrad/tinygrad/issues/13364
2026-01-18 16:01:09 -05:00
chenyu
e7c2df9113 improve consecutive Tensor indexing (#14208)
* improve consecutive Tensor indexing

instead of O(idx_counts*src_dims), it can just be O(idx_counts)

* test correctness
2026-01-18 15:14:33 -05:00
chenyu
c7b8f6496f remove dtypes.index_like and dtypes.fields [pr] (#14207)
barely used, so just use inline and DTYPES_DICT
2026-01-18 11:49:01 -05:00
chenyu
5e6a72c33f new Onnx Gather (#14187)
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
chenyu
ab244c7f81 onnx Gather should not assume indices to be const (#14185)
* onnx Gather should not assume indices to be const

added a failed test case

* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234 tk: fa jit fix (#14170) 2026-01-16 16:38:45 -08:00
Christopher Milan
a021b84604 autogen: fix enum (#14171) 2026-01-16 01:30:11 -05:00
chenyu
14e9a71a41 move test_assign to unit (#14165)
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
Christopher Milan
0cb024a5bb remove ctypes.Structure (#13651) 2026-01-15 05:06:22 -05:00
qazal
164bc678a6 scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules (#14161)
* simplest failing test

* min fix

* same function reuses the cache

* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe codegen/custom_kernel: do not attach KernelInfo to user program (#14160) 2026-01-15 14:01:48 +09:00
chenyu
add7da268f multiple slice assign test (#14157)
GANing test cases
2026-01-14 21:08:03 -05:00
chenyu
1381daac06 many more failed assign tests (#14153)
assign is quite broken
2026-01-14 16:20:28 -05:00
chenyu
899a56446e failed assign test cases with write before read (#14148)
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
2a2c1eacf6 disable fast_idiv on metal (#14137)
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
chenyu
fe00682502 clean up svd tests (#14133)
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
e610821c52 Tensor.cummin and Tensor.nonzero (#14131) 2026-01-13 15:09:56 -05:00
chenyu
176a934ddd Tensor.diagonal support offset and dims (#14130) 2026-01-13 14:49:06 -05:00
qazal
79d00521f8 viz: fix cfg err when endpgm is in the middle of stream (#14128)
* kernel from beautiful_mnist

* minimal test

* correct way to do this

* rm that
2026-01-14 02:00:34 +09:00
qazal
fd10fd245a viz: cfg tokenizer fix and unit tests (#14121)
* output Ops.BINARY

* failing test for the cfg

* dsl renamed to offset and sz

* add better asserts

* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696 also return index in Tensor.cummax (#14117)
* also return index in Tensor.cummax

* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4 tk: add failing test for fa multidevice (#14116) 2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e assembly/amd: clean up dsl and make type verification strict (#14102)
* assembly/amd: start newdsl

* work

* newdsl upd

* Reg is p nice

* cleaner

* work

* getting clean

* all fields

* more BitFields

* redo the pdfs with dsl2 syntax

* no lit

* cleanups

* more defaults

* fix get and remove crap

* aliases

* ugly but kind of works

* NULL, not rawimm

* clean up defaults

* only dsl

* asm fixes

* lit fixup

* more lit

* cleanups

* olddsl

* single pcode dict

* emu sort of works

* trash test

* global is global

* types property

* reg mods

* fix a few tests

* remove monkey patch

* fixes

* less hacks in tests

* less hacks in tests

* 4 test failures

* hw tests all pass

* fix compare emulator

* fix some tests

* 3 more

* fix and shorten sqtt

* handwritten

* fix validation

* test corrections

* all types validate

* fix dsl2 tests

* fix bugs in disasm

* skips on cdna

* work

* repr with reg[]

* fix bitfield tests

* merge pcodes in dsl

* remove override

* disasm uses inst.types

* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e add Tensor.log10 with test\test_ops.py::TestOps::test_log10 (#14113) 2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6 don't strip sink in to_uops_list [pr] (#14111) 2026-01-12 11:19:03 -05:00
chenyu
cad7feec02 more onnx ops (#14104)
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
chenyu
9973a81356 add channels_last to QLinearGlobalAveragePool (#14094)
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
35c9701df0 update outdated tests and comments (#14090) 2026-01-10 01:00:48 -05:00
chenyu
92246ea731 update tests, WEBGPU=1 pytest . passes (#14089)
* update tests, `WEBGPU=1 pytest .` passes

* minor update
2026-01-10 00:03:02 -05:00
chenyu
c34c6d9468 fix wgsl packed_store can drop valid (#14088)
* fix wgsl packed_store can drop valid

* fix
2026-01-09 15:22:06 -05:00
chenyu
eacccc5ace more disk assign tests (#14087)
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc don't skip gguf test if ggml is not installed (#14086)
* don't skip gguf test if ggml is not installed

should just let it fail

* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78 add some disk assign tests (#14085) 2026-01-09 11:50:59 -05:00