Commit Graph

5434 Commits

Author SHA1 Message Date
chenyu
10c262ced8 update tests that use UOp.size (#15753) 2026-04-15 21:58:27 -04:00
qazal
96092d110c fix process_replay Ops.BEAM [pr] (#15752) 2026-04-16 07:35:28 +09:00
Christopher Milan
be8005c5dc DEV: secondary targets (#15748) 2026-04-15 17:26:20 -04:00
chenyu
507c02cecb fix symbolic contiguous_view_offset (#15749)
* fix symbolic contiguous_view_offset

* flatten
2026-04-15 16:54:38 -04:00
nimlgen
164495678c test_graph to use uops (#15746)
* test_graph to use uops

* x

* n
2026-04-15 21:59:41 +03:00
Christopher Milan
1c36878008 DEV: suggest alternatives (#15732) 2026-04-14 23:42:32 -04:00
George Hotz
1ae6528bb6 move schedule into schedule (#15736)
* move schedule into schedule

* callify to root

* sched docs
2026-04-15 11:03:25 +08:00
chenyu
3394d18066 size*itemsize -> nbytes (#15729)
and some UOp.size removal to prep for size to mixin change
2026-04-14 16:27:54 -04:00
George Hotz
2450c8cba8 rename to callify + fix mypy (#15727)
* rename to callify + fix mypy

* update test
2026-04-14 23:43:19 +08:00
George Hotz
359b1582d6 amd: EMU DPP support (#15719)
* EMU DPP support from GPT 5.4

* cleanups

* simple

* nope

* fix
2026-04-14 14:58:41 +08:00
wozeparrot
2b8d303f75 allreduce in precast dtype (#15689) 2026-04-13 20:24:12 -07:00
George Hotz
5683126844 llm: support for tekken tokenizer (#15720) 2026-04-14 10:52:07 +08:00
chenyu
70883a6950 cat the stack to mixin (#15715) 2026-04-13 18:44:39 -04:00
qazal
905b8adc97 viz: cli and server cleanups (#15713)
* update get_profile arg[0]

* uop_to_json arg[0]

* data is standalone in cli
2026-04-14 06:42:29 +09:00
Christopher Milan
d83707ec29 autogen: explicit types (#15679) 2026-04-13 16:54:39 -04:00
chenyu
ac41f15fc1 cumsum to mixin (#15712)
built on top of getitem
2026-04-13 15:06:08 -04:00
chenyu
931d6cc62a basic getitem to mixin (#15697)
* basic getitem to mixin

* cleanup

* fix

* cleanup
2026-04-13 13:04:36 -04:00
George Hotz
7610bdc59e block multistore, it's not supported (#15708) 2026-04-13 20:57:59 +08:00
George Hotz
16f50a40a5 remove REMU from tree (#15706)
* no more compare emulators

* remove remu from tree
2026-04-13 20:43:08 +08:00
qazal
ac027055ef viz: no global state (#15705)
* start viz data

* get_full_rewrites also moves

* update ref_map

* work

* update consumers

* cleaner cli

* linter

* cleanup tests

* back

* better

* sqtt tests
2026-04-13 21:35:20 +09:00
George Hotz
4c1fb18a09 Revert "Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (…" (#15703)
This reverts commit 0cec42db71.
2026-04-13 19:09:38 +08:00
George Hotz
0cec42db71 Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700)" (#15702)
This reverts commit 6f5d756282.
2026-04-13 19:06:44 +08:00
George Hotz
6f5d756282 Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700)
* broken after/assign test

* test for GatedDeltaNet

* better comments

* fix issue 1 with multi kernel

* fix 2

* fix

* linter

* public api + cleanup
2026-04-13 18:43:23 +08:00
chenyu
f7ff480fa6 start mixin getitem tests (#15695)
goal is to make Tensor[idx].uop equal to Tensor.uop[idx]
2026-04-12 18:54:33 -04:00
chenyu
e706f408cb suppress test warnings from numpy (#15688) 2026-04-11 22:33:20 -04:00
Graham Robbins
4ca844e96b add Q1_0 gguf type (#15683)
* add Q1_0

* better description

* fix trailing whitespace

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-04-11 18:17:24 +08:00
wozeparrot
457508d5a0 llama: save more 2 (#15681) 2026-04-11 01:03:36 -07:00
George Hotz
b5a9465b13 llm: add support for moonlight (deepseek MLA) (#15466)
* add gguf Q5_0

* it works

* rebase

* simpler test

* class

* less diff

* dicts

* normal names

* simplify

* this

* simpler

* work

* work
2026-04-11 10:32:48 +08:00
chenyu
8e7fcc8ca3 remove _include_initial in _cumalu (#15674)
handle negative pad in caller
2026-04-10 08:33:30 -04:00
George Hotz
9092f2a8c0 llm: add shared_expert and rope_dim support from qwen35 (#15673)
* llm: add shared_expert and rope_dim support from qwen35

* refactor into FFNBlock and TransformerBlock

* norms where they belong
2026-04-10 19:18:27 +08:00
b1tg
9ab1415937 llm: fix streaming UTF-8 decode (#15653) 2026-04-10 17:01:02 +08:00
Christopher Milan
dbc23e8a1b move HCQ_VISIBLE_DEVICES into DEV (#15668) 2026-04-09 22:01:35 -04:00
Christopher Milan
d08c76d9cb c.Struct cleanup (#15640) 2026-04-08 20:07:16 -04:00
chenyu
4cf2759fc8 fix merge_reduce_ends (#15659)
* fix merge_reduce_ends

same range with different nesting should not merge, like cumsum twice should not merge

* skip that
2026-04-08 17:20:01 -04:00
qazal
71c83cc3f6 viz: put OTHER_ on the wave row (#15650)
* viz: put OTHER_ on the wave row

* update tests

* cleanup cli
2026-04-08 23:13:44 +09:00
qazal
3ac16b3bea viz: add wmma row, update exec duration logic (#15646)
* viz: split wmma to its own row, fix duration logic

* regs

* decrease number of loops, add pickle

* assert overlaps
2026-04-08 20:24:23 +09:00
George Hotz
35e3983840 Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644) 2026-04-08 17:16:19 +08:00
qazal
39a029ec55 remove ASM_GEMM context var (#15645) 2026-04-08 18:02:40 +09:00
qazal
dc6a51e44d viz: add # of bytes to sdma (#15639)
* viz: add # of bytes to sdma

* update test_viz
2026-04-08 17:43:37 +09:00
wozeparrot
70dbd35023 llama: move custom_kernel into flat_llama (#15643) 2026-04-08 00:19:14 -07:00
George Hotz
f930579b7a llm: change the default port to 8000 so you can remember it (match vLLM) 2026-04-08 11:25:38 +08:00
George Hotz
2b01ca59dd USB driver for custom ASM firmware (#15597)
* USB driver for custom ASM firmware

* timeout

* fix mypy

* pcie mem read

* flip in f/w

* one tx

* litle endian

* autodetect custom

* mock bypass

* lint

* clean
2026-04-07 13:45:41 +08:00
Christopher Milan
19e96497ee interface in DEV (#15620) 2026-04-06 19:59:28 -04:00
qazal
8ba58304f7 viz: reenable tests (#15626) 2026-04-07 07:52:44 +09:00
chenyu
2f7d085450 shared _normalize_indices for getitem (#15625)
* shared _normalize_indices for getitem

* list
2026-04-06 17:45:36 -04:00
chenyu
a444be172d lower fuzz_symbolic_symbolic_div timeout (#15619)
mitigate timeout crash due to high total time
2026-04-06 12:58:29 -04:00
chenyu
01b49c8647 support int operand for shifts (#15618)
matches torch/jax, also symbolic rule to remove mask
2026-04-06 12:32:12 -04:00
Valtteri Valo
86c4431d74 add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079)
use supportsFamily API to detect GPU generation instead of parsing
ICB debug description strings. also adds metal4.0 compiler target.
2026-04-06 06:51:38 +08:00
Andrew Cappelli
e39cfe685a validate lr, momentum, weight_decay in optimizers (#15576) 2026-04-06 06:37:34 +08:00
nimlgen
e3986a6b74 mlx: init runtime (#15612)
* mlx: init

* x

* swap
2026-04-05 22:52:29 +03:00