tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-13 16:37:04 +08:00

Author	SHA1	Message	Date
nimlgen	768106a542	remove schedule from extra/docs/examples (#15929 ) * remove schedule from extra/docs/examples * f	2026-04-25 14:09:12 +03:00
Christopher Milan	645d45d968	DEV has arch (#15577 ) Co-authored-by: Comma Device <device@comma.ai>	2026-04-03 19:17:19 -04:00
Christopher Milan	0ed8d9271d	Renderers accept Target or nothing (#15590 )	2026-04-03 01:09:41 -04:00
George Hotz	c0de4f75b1	improve mmapeak, print names with sqtt (#14726 )	2026-02-13 16:07:06 +08:00
George Hotz	4088d686b2	remove llvm requirement from amd (#14717 ) * remove llvm requirement from amd * tests pass * test * sink kernarg_size * move stuff * amd_asm_matmul to new style * default type * fix tests, simpler * cu mode is faster and simpler * darken	2026-02-13 10:50:12 +08:00
George Hotz	4680247e35	renderer/amd: move in tree (#14702 ) * renderer/amd: move in tree * fix paths in tests * 24000 lines * no delete for amd files	2026-02-12 18:09:16 +08:00
qazal	f866b2a513	mfma loop in asm dsl (#14349 ) * mfma loop in asm dsl * work	2026-01-27 11:11:37 +09:00
qazal	2d91fe6310	use amdgpu dsl in mmapeak (#14342 ) * use amdgpu dsl in mmapeak * don't rely on llvm for vgpr counting * llvm roundtrip assert * rm it, add ci * vgpr_count * move emulated test to amd, it needs comgr * env * arch * inst._fields -> inst.operands * vgpr offset	2026-01-26 22:03:43 +09:00
qazal	dff5f361b0	support rendering assembly kernels on the NULL backend (#14283 ) * assembly custom kernels in DEV=NULL, use renderer arch * update mmapeak * llvm	2026-01-22 15:49:07 +09:00
qazal	3f3786ded9	mmapeak: fix compiler import (#13915 )	2025-12-31 16:52:23 +09:00
George Hotz	97b56e11e0	hotfix: 32 workgroups for radeon 8050s	2025-11-30 08:20:17 -08:00
George Hotz	cabd4add48	more work parsing SQTT, separate VIZ/PROFILE (#13308 ) * more work parsing SQTT * more minimal runner * sep VIZ/PROFILE * parse print new * improve parser * more filter * that * split them * lil cleanup * skip flaky test * AQL in mmapeak	2025-11-16 10:40:39 -08:00
George Hotz	ba84d415fe	work from benchmarking tinybox red v2 (#13264 ) * work from benchmarking tinybox red v2 * gpuburn	2025-11-13 16:38:40 -08:00
George Hotz	65a0a31475	AMD mi350x matmul from stream (#13040 ) * works * working mfma * 120 TFLOPS * regs * 192 TFLOPS * try pipelining * something * notes * contract * linter to 3.11 * that was a bug	2025-11-01 17:55:19 +08:00
nimlgen	59784a5972	amd: ensure ts is written (#12794 )	2025-10-19 23:55:49 +08:00
George Hotz	89e7f2fa00	mmapeak: gfx1103 support	2025-10-19 16:57:28 +08:00
George Hotz	617614beb7	add mi350x support to mmapeak (#12784 )	2025-10-19 16:11:07 +08:00
Panagiotis Kourouklidis	e21836952d	mmapeak implementation for 7900 XTX (#10417 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-23 16:26:12 -07:00

18 Commits