Commit Graph

22 Commits

Author SHA1 Message Date
Christopher Milan
172f9493e1 move is_dtype_supported to renderer (#16226) 2026-05-20 21:19:37 -04:00
chenyu
8631b6f17d remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
chenyu
0ddc50d050 do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
qazal
ebcb7b7cc0 fp8 gemm tests with scale args (#16231)
* update atol

* update fp8 path

* more work

* update profile.sh
2026-05-16 20:47:58 +09:00
chenyu
409bb0c9ad requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
chenyu
7a1adfd2aa update Tensor.allclose to return Tensor (#15904)
matches jax
2026-04-24 08:27:17 -04:00
Christopher Milan
6adf4c3cd9 MOCKGPU interfaces (#15796) 2026-04-17 21:56:29 -04:00
qazal
39a029ec55 remove ASM_GEMM context var (#15645) 2026-04-08 18:02:40 +09:00
wozeparrot
70dbd35023 llama: move custom_kernel into flat_llama (#15643) 2026-04-08 00:19:14 -07:00
wozeparrot
7e54992bf6 fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
2026-04-04 18:24:57 -07:00
Christopher Milan
645d45d968 DEV has arch (#15577)
Co-authored-by: Comma Device <device@comma.ai>
2026-04-03 19:17:19 -04:00
Christopher Milan
0ed8d9271d Renderers accept Target or nothing (#15590) 2026-04-03 01:09:41 -04:00
qazal
fefb0ebc2a gemm/asm: fp8 cleanups (#15580)
* normal gemm here

* s/dtypes.fp8e4m3/FP8_DTYPE

* gemm_bw

* device UOp stays NULL
2026-04-02 19:02:38 +09:00
qazal
8feb8edc68 gemm/asm: add fp8 support to cdna asm_gemm (#15542)
* work

* hmm, mixins

* rhs_transposed

* also fix the dtype

* check for hipcc

* Exception

* select dev

* default
2026-03-31 19:32:54 +09:00
qazal
f88e255cea gemm/asm: split and parameterize dtype in llama gemm tests (#15408)
* gemm/asm: more tests for emulator, parameterize llama gemm tests

* bf16 atol
2026-03-31 17:12:44 +09:00
Christopher Milan
bc180a963c deprecate <dev>=1 in favor of DEV=<dev> (#15467)
* start work on target

* add test

* update actions to use DEV

* update docs

* update readmes

* tests need that too

* update example

* update tests (comments)

* fix that test

* ruff

* mypy

* oops

* remove getenvs

* don't add Target yet

* and the test

* lint

* and docs

* more stuff

* assert

* few more fixes

* test assert
2026-03-26 03:48:03 -04:00
qazal
176ad47d7d cdna4 emulator testing ASM_GEMM in CI (#15373)
* cdna emulator work

* accvgprs

* cdna passes most tests

* ruff

* add cdna4 to tests

* cdna emu

* crash

* pass?

* work

* gen

* clean up wave_size access

* asm_gemm passes

* remove acc from dsl.py, emulator can keep its different reg file

it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can
be cleaned up later, but not functionally required for emulator.

* split asm_gemm tests to ones fast on the emulator

* don't do that

* 124 stays null on rdna

* the segfault was because of hw regs, not this

* Revert "clean up wave_size access", it's explicitly tested

This reverts commit 1202ff5787.

* nullcopyout

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-03-20 05:51:30 +09:00
qazal
33bd33e783 sqtt: add CDNA ops enum, show in viz (#15140) 2026-03-17 09:38:42 +09:00
qazal
5cd1daa3bc cdna asm_gemm in one file, remove old rdna3 asm (#15281) 2026-03-16 04:32:30 +09:00
wozeparrot
c35de9bd68 asm_gemm: support more sharding (#15002) 2026-03-02 23:16:37 -08:00
qazal
5b6fcd1cda gemm/asm: smallest cdna4 asm gemm test (#14925) 2026-02-21 11:56:05 +09:00
qazal
f590564bf7 gemm multiple is only for cdna4 asm (#14814)
* gemm multiple is only for cdna4 asm

* move to backend

* and arch

* path
2026-02-17 14:00:02 +09:00