tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-13 16:37:04 +08:00

Author	SHA1	Message	Date
nimlgen	77965a22e5	local optimize as rewrite (#15953 ) * local optimize as rewrite * better * x * slighly rename * fix * ugh * remove * x * remove * not weak	2026-04-28 22:51:04 +03:00
nimlgen	4164666c72	programinfo (#15942 ) * programinfo * fix * m * x * x * changes * x * fix * rm	2026-04-27 23:12:03 +03:00
nimlgen	bb652352c7	remove execitem (#15932 ) * remove execitem * f * x	2026-04-25 19:33:04 +03:00
nimlgen	768106a542	remove schedule from extra/docs/examples (#15929 ) * remove schedule from extra/docs/examples * f	2026-04-25 14:09:12 +03:00
nimlgen	f2751955cb	remove linear_to_schedule from tests (#15912 ) * remove linear_to_schedule from tests * x	2026-04-24 20:02:10 +03:00
chenyu	9192c93b7e	Tensor.invalid -> Tesnor.invalids (#15849 ) matches ones and zeros, and to not share name with UOp.invalid	2026-04-21 11:19:51 -04:00
nimlgen	bfe28ee2ad	rm run_schedule (#15847 )	2026-04-21 18:14:30 +03:00
wozeparrot	9e60e4a7e7	llama: native fp8 (#15733 )	2026-04-16 22:16:05 -07:00
qazal	12c653a743	remove opts arg in get_program, everything uses opts_to_apply [pr] (#15767 ) * check Ops.BEAM in process replay * remove opts from the get_program api * lint * simplify * cleanup	2026-04-16 22:42:43 +03:00
chenyu	3394d18066	size*itemsize -> nbytes (#15729 ) and some UOp.size removal to prep for size to mixin change	2026-04-14 16:27:54 -04:00
wozeparrot	55bcd7cc9e	llama amax outside (#15670 )	2026-04-09 23:08:03 -07:00
George Hotz	48a7627b04	add RDNA4 support to copy WMMA (#15663 ) * add RDNA4 supportt to copy WMMA * simpler * simpler * comment * assert	2026-04-09 22:48:20 +08:00
George Hotz	1ebeb52e59	RDNA4 asm gemm (#15427 ) * sqtt: rdna4 decoder work * diff cleanup * more diff * test * 125 * r4 --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2026-04-08 21:26:44 +08:00
wozeparrot	70dbd35023	llama: move custom_kernel into flat_llama (#15643 )	2026-04-08 00:19:14 -07:00
wozeparrot	7e54992bf6	fp8 llama (#15588 ) Co-authored-by: qazal <qazal.software@gmail.com>	2026-04-04 18:24:57 -07:00
Christopher Milan	0ed8d9271d	Renderers accept Target or nothing (#15590 )	2026-04-03 01:09:41 -04:00
qazal	fefb0ebc2a	gemm/asm: fp8 cleanups (#15580 ) * normal gemm here * s/dtypes.fp8e4m3/FP8_DTYPE * gemm_bw * device UOp stays NULL	2026-04-02 19:02:38 +09:00
chenyu	1aa04eab08	simple CreationMixin (#15567 ) start with full_like, zeros_like, ones_like	2026-04-01 23:00:56 -04:00
qazal	8feb8edc68	gemm/asm: add fp8 support to cdna asm_gemm (#15542 ) * work * hmm, mixins * rhs_transposed * also fix the dtype * check for hipcc * Exception * select dev * default	2026-03-31 19:32:54 +09:00
George Hotz	85dee83f5d	amd flash attention cleanups + emulator fixes (#15431 ) * amd flash attention cleanups * simpler * params * fix emulator bugs * fix idiv bug * remove that test * more emu fixes	2026-03-24 10:10:46 +08:00
George Hotz	c62dea6881	ai slop flash attention (it works) (#15401 ) * ai slop flash attention (it works) * speed up, 2 TFLOPS + 7 GB/s * simpler * simpler * optimize * faster * warp shuffle * sqtt: link dispatch to exec (#15396) * sqtt packet linking infra python * javascript * ~doubly linked list * ui works * work * exec can also highlight the pc, coloring work * more work * rm sqtt/model.py, doesn't need to be upstreamed * viz: no context enters in cli, update llama profile (#15404) * removed unused named arg in rules [pr] (#15414) * viz: sqtt printer in viz/cli.py (#15411) * work * sqtt timeline in CLI * format all printers nicely * s/Showed/Printed * ansistrip * sys.exit * keep colors in list * work from amd_copy_matmul * has_more always gets returned * linter * don't print colors * more colors * wow this is so deep * work * minor details * selected * improve progress bar * remove it * 22, global_load_vaddr is so long * remove 0 hack in sign, gradient materializes zeros for unconnected nodes (#15416) Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb Co-authored-by: Amp <amp@ampcode.com> works * cnt=20 * revert that * uop slice tests * simpler --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: gg <ggordbegli@gmail.com> Co-authored-by: Amp <amp@ampcode.com>	2026-03-23 16:15:10 +08:00
George Hotz	c13d9d29ff	add SHAPED_WMMA (#15400 ) * add SHAPED_WMMA * shaped wmma * less bad	2026-03-21 16:16:03 +08:00
George Hotz	41a9b09683	minimal vec in amd_copy_matmul (#15398 ) * minimal vec in amd_copy_matmul * unified * unify * reshape/permute * cleanups * simpler * move index * cleanups * more shared	2026-03-21 14:57:21 +08:00
George Hotz	1a2a203f48	add wmma support to amd_copy_matmul (#15384 ) * add wmma support to amd_copy_matmul * 15 TFLOPS and merged * unify * simpler * simpler * simpler * cleanups * TM/TN is the full regs * comments * WAVES_PER_SH + SQTT_EVENT * Add WAVERDY support * no split warp * 3 range	2026-03-20 19:02:19 +08:00
chenyu	da1700e16b	dtypes.index -> dtypes.weakint (#15377 )	2026-03-20 01:08:46 -04:00
George Hotz	4091d37e8e	flat llama step work (#15355 ) * flat llama step work * fp8 support * blacklisted matmul * chestertons fence	2026-03-20 09:06:12 +08:00
George Hotz	6e196195d8	add test for flat llama (#15327 ) * add test for flat llama * simpler * back to split w1/w3 * env * still too much ram * invalid	2026-03-18 15:16:33 +08:00
qazal	5cd1daa3bc	cdna asm_gemm in one file, remove old rdna3 asm (#15281 )	2026-03-16 04:32:30 +09:00
George Hotz	06d7cddb33	amd_copy_matmul is cleaner (#15248 ) * amd_copy_matmul is cleaner * it runs * replicated stuff * add tid there * it runs * cleanup * x.src[1] * flatten * move that * keep that assert	2026-03-14 12:56:09 +08:00
George Hotz	a7d2429c21	amd_uop_matmul more cleanups (#15240 )	2026-03-13 10:24:43 +08:00
George Hotz	e560a46f59	update amd_uop_matmul (#15236 ) * update amd_uop_matmul * use custom kernel * simpler * ignore	2026-03-12 17:33:12 +08:00
wozeparrot	c35de9bd68	asm_gemm: support more sharding (#15002 )	2026-03-02 23:16:37 -08:00
qazal	62ee976c1b	gemm/asm: cleanup repeated patterns to helper functions (#15094 )	2026-03-03 08:14:47 +09:00
qazal	448e997be4	gemm/asm: cleanup custom function args (#15007 )	2026-02-25 22:05:56 +09:00
qazal	f590564bf7	gemm multiple is only for cdna4 asm (#14814 ) * gemm multiple is only for cdna4 asm * move to backend * and arch * path	2026-02-17 14:00:02 +09:00
George Hotz	5bd2862d1a	late compile the cdna gemm (#14783 ) * late compile the cdna gemm * remove old things * finalize inplace --------- Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-17 13:04:22 +09:00
George Hotz	f081f154ae	parameterize the CDNA asm gemm (#14813 ) * parameterize the CDNA asm gemm * fix llama test * fix * add more gemmt ests * confirm all match * test these asm gemms	2026-02-17 11:35:18 +08:00
qazal	c7a4dbf918	viz: get program binary from the UOp (#14787 ) * viz: get program binary from the UOp * remove that * less * rename View Program to View Source * two words * fix	2026-02-16 15:46:58 +09:00
George Hotz	dff9cf35c2	amd asm emulator fixes + run it in CI (#14786 ) * amd asm fix, try 2 * fix tests	2026-02-16 13:24:21 +08:00
qazal	55a4dfa2e0	cdna4 asm_gemm tests in CI on the null backend (#14785 ) * cdna4 asm_gemm tests in CI on the null backend * no .numpy() in null * better * gemm/asm: device comes from renderer	2026-02-16 14:06:23 +09:00
George Hotz	4088d686b2	remove llvm requirement from amd (#14717 ) * remove llvm requirement from amd * tests pass * test * sink kernarg_size * move stuff * amd_asm_matmul to new style * default type * fix tests, simpler * cu mode is faster and simpler * darken	2026-02-13 10:50:12 +08:00
George Hotz	4680247e35	renderer/amd: move in tree (#14702 ) * renderer/amd: move in tree * fix paths in tests * 24000 lines * no delete for amd files	2026-02-12 18:09:16 +08:00
George Hotz	befc1e800c	assembly/amd: disasm is test only (#14694 ) * assembly/amd: disasm is test only * viz uses str	2026-02-12 12:33:46 +08:00
George Hotz	3fab43c57c	add cache to asm gemm (#14675 )	2026-02-11 08:26:30 +08:00
qazal	80b0119cef	llama: add new asm gemm shape (#14611 ) * llama: add new asm gemm shape * work * cleanup * half dtype * more comment	2026-02-10 00:34:29 +09:00
George Hotz	183d38b128	remove CUSTOM_KERNEL / directly construct it (#14604 ) * remove CUSTOM_KERNEL / directly construct it * clean that up * simpler multi * custom kernel spec * remove Kernel * fix multi * use sharded shape * explicit regression test	2026-02-08 18:43:33 +08:00
qazal	cf73d7e2a7	hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582 )	2026-02-06 15:05:19 +09:00
George Hotz	43e7eda4e7	grad_b uses custom gemm (#14550 ) * grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-05 15:22:27 +09:00
qazal	f9cfb64cd9	test asm_gemm in CI (#14551 ) * test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half	2026-02-05 13:32:22 +09:00
chenyu	d57d24c7d4	Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535 ) it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data	2026-02-04 11:31:11 -05:00

1 2 3 4 5

236 Commits