tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-13 00:15:35 +08:00

Author	SHA1	Message	Date
qazal	1e0fffe256	fused ce llama kernel in UOps (#16263 ) * work * using uops * delete things * work * work * higher level uops * cleanups	2026-05-20 19:45:28 +09:00
nimlgen	b3dcf8f452	hcq2: split into schedule/realize (#16216 ) * hcq2: split into schedule/realize * missing * x * f * clean * cleaner * x * x * x * x * x	2026-05-19 16:40:17 +03:00
qazal	e4350e7de9	set hipcc mac docker to 7.1 (#16261 ) * set hipcc mac docker to 7.1 * pull from amd	2026-05-19 21:30:39 +09:00
qazal	bfb2d1f89a	Revert "fp8 gemm speedup (#16236 )" (#16245 ) This reverts commit `d95bf394e1`.	2026-05-19 02:01:44 +09:00
chenyu	dcee90aa3f	remove requires_grad use in extra/examples (#16238 ) except the ones fed into optimizer	2026-05-16 18:40:26 -04:00
qazal	d95bf394e1	fp8 gemm speedup (#16236 ) * add asm_gemm option * milestone * work * edit * only the fast kernel * diff	2026-05-17 04:58:28 +09:00
qazal	d54fa86b71	viz/cli: select all calls in graph by default (#16214 )	2026-05-15 21:01:44 +09:00
nimlgen	28b98e529d	nv: move structs to vram (#16184 ) * nv: vram * x * 4090 * x * move and sysmem on macos * x * remove hp	2026-05-15 13:41:42 +03:00
chenyu	409bb0c9ad	requires_grad cannot be None (#16212 ) final goal is to remove requires_grad, first change the default to True, and don't allow None	2026-05-15 02:01:04 -04:00
nimlgen	62ea73719d	hcq2: share more with graph (#16196 ) * share more with graph * comment	2026-05-14 22:28:11 +03:00
nimlgen	ad1fb7c981	hcq2: graph (#16186 ) * keep this for now * early graph	2026-05-13 22:49:43 +03:00
wozeparrot	e97f2c1114	llama: only gemm + fa custom kernel (#16180 ) * llama: tie store to grad directly * llama: set mp flags * llama: non fused grad fp8 quantize path	2026-05-12 21:03:49 -07:00
Christopher Milan	316607f004	dsp: don't use docker in ci (#16167 ) * dsp: don't use docker in ci * add setup script for macos docker	2026-05-12 17:11:03 -04:00
nimlgen	e93fb5f9b9	hcq2: remove hcqprogram (#16157 ) * hcq2 rm program * nonbeauty * no prog * tiny * f * x	2026-05-12 18:49:13 +03:00
nimlgen	e5729935c6	time_call (#16152 ) * time_call * x * fix caches	2026-05-12 16:58:28 +03:00
chenyu	3942a80f66	fix wrong kwargs passed into rands (#16149 ) working towards explicit args for these	2026-05-11 22:22:06 -04:00
Vikram Rangarajan	effa263865	Torch backend `aten::cat.out` fix (#16121 ) * Handle empty 1D tensors in cat_out * Undid other changes * Fixed torch cat * Improved cat.out, added more tests * Cleaned code * Type hinted dim * Removed whitespace	2026-05-11 16:28:16 -07:00
qazal	fc2cc1d77a	viz: call graph renderer example (#16141 ) * work * emits * this * cleaner repr for custom binaries * --call-graph * _ref * this * start * this * everything execpt the pyrender * bring pyrender back	2026-05-12 05:07:30 +09:00
nimlgen	70c2480e71	hcq2 to extra (#16126 ) * hcq2 in extra * correct * some revert from non-extra * cln * cpu * x * attach * min * remove attach * linter	2026-05-11 17:17:30 +03:00
nimlgen	ad9738892c	get_buf() for Buffer (#16134 ) * p * mypy * x	2026-05-11 16:36:14 +03:00
Christopher Milan	faabe6aa42	nv: remaining firmware from /lib/firmware (#16088 )	2026-05-07 23:07:43 -04:00
Christopher Milan	9a6f7f7576	nv: look for fmc firmware in /lib/firmware (#16080 )	2026-05-07 18:08:27 -04:00
nimlgen	2f0aa884d5	tinygpu: minimal is macos13 for resets (#16075 )	2026-05-07 21:25:56 +03:00
wozeparrot	730fa66bf3	llama speed 6 (#16071 )	2026-05-06 20:51:03 -07:00
wozeparrot	ab6218bc92	llama mp fixes (#16050 )	2026-05-05 15:35:32 -07:00
wozeparrot	528d35e306	llama speed 4 (#15993 )	2026-04-30 17:14:41 -07:00
wozeparrot	eddcd4723b	am_smi throttle info (#15997 )	2026-04-30 15:28:32 -07:00
nimlgen	dfd2d07005	remove CompiledRunner (#15970 ) * rm usage of CompiledRunner * more tests * last * linter * sink * remove * linter	2026-04-29 22:45:48 +03:00
qazal	a37b605523	remove arch from asm kernel class (#15977 ) * rm arch from kernel * update other tests * update abstractions4.py	2026-04-30 03:39:52 +09:00
qazal	b63e0a5f74	viz/sqtt: move amd decoder to extra, don't import from ops_amd (#15969 ) * don't import from ops_amd * start * cleanup	2026-04-30 00:49:15 +09:00
wozeparrot	ef09071073	llama: speed 2 (#15960 )	2026-04-28 20:44:37 -07:00
Christopher Milan	e6863a1cc5	autogen: fewer type: ignores (#15956 )	2026-04-28 21:58:13 -04:00
nimlgen	77965a22e5	local optimize as rewrite (#15953 ) * local optimize as rewrite * better * x * slighly rename * fix * ugh * remove * x * remove * not weak	2026-04-28 22:51:04 +03:00
qazal	b3f0f8d349	llama: fix missing label_smoothing arg (#15955 )	2026-04-29 02:12:14 +09:00
wozeparrot	5e861cd2c4	llama: move llama kernels to llama_kernels (#15952 )	2026-04-27 22:48:53 -07:00
nimlgen	4164666c72	programinfo (#15942 ) * programinfo * fix * m * x * x * changes * x * fix * rm	2026-04-27 23:12:03 +03:00
qazal	8c174bdad4	viz/sqtt: correct exec pipes (#15885 ) * wmma * p2 * test * left * work * pickle * handwritten failing tests * start work * test the pipes * empirical evidence * update rdna4 enum types * VALU pipe 1 * TRANSCENDENTAL pipe * transcendental function units * reorder * wmma pipe * cleanup and notes * smaller * work * diff cleanup * pickle * use se:1 * int	2026-04-28 05:05:49 +09:00
nimlgen	bb652352c7	remove execitem (#15932 ) * remove execitem * f * x	2026-04-25 19:33:04 +03:00
nimlgen	768106a542	remove schedule from extra/docs/examples (#15929 ) * remove schedule from extra/docs/examples * f	2026-04-25 14:09:12 +03:00
Denys Melnyk	1fdcb13bfb	webgpu: fix weight lookup in export_model after compile_net key change (#15919 ) * fix lookup site in export_model_webgpu after refactoring webgpu (sd): fix export_model weight lookup after compile_net changes fix lookup site in export_model_webgpu after refactoring * add regression test	2026-04-25 10:04:55 +03:00
wozeparrot	4b908b6e2c	llama: fused ce loss (#15920 )	2026-04-24 20:01:24 -07:00
nimlgen	f2751955cb	remove linear_to_schedule from tests (#15912 ) * remove linear_to_schedule from tests * x	2026-04-24 20:02:10 +03:00
qazal	f379b5a40a	sqtt: match amd's TS_DELTA_SHORT offset (#15901 )	2026-04-24 06:41:22 +03:00
wozeparrot	d3cbd781d9	llama: use fused norm mul quantize for w13 (#15878 )	2026-04-22 21:27:41 -07:00
nimlgen	e5891acab2	jit: precompile (#15848 ) * x * jit: precompile as sep step * x * s * x * x * x * ? * ? * x * x * viz * f * x * u * x * x	2026-04-23 00:23:32 +03:00
wozeparrot	87378331e8	llama: fused mul quantize fp8 (#15863 )	2026-04-21 20:58:37 -07:00
chenyu	9192c93b7e	Tensor.invalid -> Tesnor.invalids (#15849 ) matches ones and zeros, and to not share name with UOp.invalid	2026-04-21 11:19:51 -04:00
nimlgen	bfe28ee2ad	rm run_schedule (#15847 )	2026-04-21 18:14:30 +03:00
nimlgen	ae9b84d32f	rm beam uop (#15844 )	2026-04-21 13:10:26 +03:00
qazal	f9655af2a3	viz/cli: move to tinygrad (#15835 ) * move cli * update imports * cleanup the readme * edit * work * details * python -m tinygrad.viz.cli * do not execv in non tty * option * lint * simpler * gemm pmc	2026-04-21 13:35:10 +09:00

1 2 3 4 5 ...

1825 Commits