Commit Graph

18 Commits

Author SHA1 Message Date
nimlgen
768106a542 remove schedule from extra/docs/examples (#15929)
* remove schedule from extra/docs/examples

* f
2026-04-25 14:09:12 +03:00
Christopher Milan
645d45d968 DEV has arch (#15577)
Co-authored-by: Comma Device <device@comma.ai>
2026-04-03 19:17:19 -04:00
Christopher Milan
0ed8d9271d Renderers accept Target or nothing (#15590) 2026-04-03 01:09:41 -04:00
George Hotz
c0de4f75b1 improve mmapeak, print names with sqtt (#14726) 2026-02-13 16:07:06 +08:00
George Hotz
4088d686b2 remove llvm requirement from amd (#14717)
* remove llvm requirement from amd

* tests pass

* test

* sink kernarg_size

* move stuff

* amd_asm_matmul to new style

* default type

* fix tests, simpler

* cu mode is faster and simpler

* darken
2026-02-13 10:50:12 +08:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
qazal
f866b2a513 mfma loop in asm dsl (#14349)
* mfma loop in asm dsl

* work
2026-01-27 11:11:37 +09:00
qazal
2d91fe6310 use amdgpu dsl in mmapeak (#14342)
* use amdgpu dsl in mmapeak

* don't rely on llvm for vgpr counting

* llvm roundtrip assert

* rm it, add ci

* vgpr_count

* move emulated test to amd, it needs comgr

* env

* arch

* inst._fields -> inst.operands

* vgpr offset
2026-01-26 22:03:43 +09:00
qazal
dff5f361b0 support rendering assembly kernels on the NULL backend (#14283)
* assembly custom kernels in DEV=NULL, use renderer arch

* update mmapeak

* llvm
2026-01-22 15:49:07 +09:00
qazal
3f3786ded9 mmapeak: fix compiler import (#13915) 2025-12-31 16:52:23 +09:00
George Hotz
97b56e11e0 hotfix: 32 workgroups for radeon 8050s 2025-11-30 08:20:17 -08:00
George Hotz
cabd4add48 more work parsing SQTT, separate VIZ/PROFILE (#13308)
* more work parsing SQTT

* more minimal runner

* sep VIZ/PROFILE

* parse print new

* improve parser

* more filter

* that

* split them

* lil cleanup

* skip flaky test

* AQL in mmapeak
2025-11-16 10:40:39 -08:00
George Hotz
ba84d415fe work from benchmarking tinybox red v2 (#13264)
* work from benchmarking tinybox red v2

* gpuburn
2025-11-13 16:38:40 -08:00
George Hotz
65a0a31475 AMD mi350x matmul from stream (#13040)
* works

* working mfma

* 120 TFLOPS

* regs

* 192 TFLOPS

* try pipelining

* something

* notes

* contract

* linter to 3.11

* that was a bug
2025-11-01 17:55:19 +08:00
nimlgen
59784a5972 amd: ensure ts is written (#12794) 2025-10-19 23:55:49 +08:00
George Hotz
89e7f2fa00 mmapeak: gfx1103 support 2025-10-19 16:57:28 +08:00
George Hotz
617614beb7 add mi350x support to mmapeak (#12784) 2025-10-19 16:11:07 +08:00
Panagiotis Kourouklidis
e21836952d mmapeak implementation for 7900 XTX (#10417)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-23 16:26:12 -07:00