tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-15 01:15:49 +08:00

Author	SHA1	Message	Date
George Hotz	770dac0e0d	broadcast	2026-05-14 17:04:37 -07:00
George Hotz	b827858479	broadcast shape	2026-05-14 17:01:20 -07:00
chenyu	09096ea565	test_gradient_through_clone (#16203 ) backward through clone crashes now	2026-05-14 19:26:47 -04:00
George Hotz	d4dcd8487b	aggressive shape check to prepare for broadcasting (#16202 ) * add implicit broadcasting to shape * NOOP/ALLREDUCE fixes	2026-05-14 16:15:44 -07:00
George Hotz	83ec66da34	fix a fastdiv edge case (#16199 )	2026-05-14 13:12:18 -07:00
nimlgen	62ea73719d	hcq2: share more with graph (#16196 ) * share more with graph * comment	2026-05-14 22:28:11 +03:00
George Hotz	3b8cc31759	disable fast idiv by default, it's broken (#16197 ) * disable fast idiv by default, it's broken * fix fast idiv tests	2026-05-14 11:48:27 -07:00
Christopher Milan	8f811649ff	better compiler_cpu invalid arch errors (#16194 )	2026-05-14 14:36:14 -04:00
qazal	f03a7fd6d1	viz/cli: readable uop json (#16195 ) * viz/cli: readable uop json repr * work * better	2026-05-14 21:33:10 +09:00
C T	1b779a9058	add gelu approximate="none" (match pytorch) (#16162 ) * add gelu approximate="none" (match pytorch) * lint * pass through onnx Gelu approximate * type annotate * explicit math.sqrt * keep tinygrad's gelu approximate="tanh" default	2026-05-13 18:53:24 -07:00
chenyu	dd9187d9ee	minor hash cleanups (#16190 ) same kernels	2026-05-13 20:59:24 -04:00
wozeparrot	88ac2ac1fd	llama: cleanups (#16189 )	2026-05-13 17:08:06 -07:00
Christopher Milan	9a365d9978	ci: fix null image tests (#16188 )	2026-05-13 18:00:05 -04:00
nimlgen	ad1fb7c981	hcq2: graph (#16186 ) * keep this for now * early graph	2026-05-13 22:49:43 +03:00
chenyu	3f9f6a51b2	minor image_conv2d cleanup (#16187 ) remove some no-op slices	2026-05-13 15:47:40 -04:00
b1tg	59c34b9fe0	llm: precise device (#16159 ) * llm: precise device * llm: pass device to precompute_freqs_cis	2026-05-12 21:16:42 -07:00
b1tg	3c806ff406	clean up gguf (#16160 )	2026-05-12 21:16:10 -07:00
wozeparrot	e97f2c1114	llama: only gemm + fa custom kernel (#16180 ) * llama: tie store to grad directly * llama: set mp flags * llama: non fused grad fp8 quantize path	2026-05-12 21:03:49 -07:00
chenyu	38d407fd58	simplify svd more (#16181 ) all the slowness is scheduling	2026-05-12 23:48:22 -04:00
Christopher Milan	f1fdd2ccec	ci: add IMAGE=1 compile-only tests (#16182 ) * ci: add IMAGE=1 compile-only tests * fix	2026-05-12 23:40:32 -04:00
George Hotz	faf7fb7513	update nir renderer for new image style (#16179 ) * update nir renderer for new image style * don't cast image indexes	2026-05-12 20:25:01 -07:00
Christopher Milan	7d0c5ab689	ci: ocelot needs nvcc on linux (#16178 ) * ci: ocelot needs nvcc on linux * cudart	2026-05-12 23:13:48 -04:00
chenyu	32138c2418	svd to mixin (#16175 )	2026-05-12 22:29:01 -04:00
George Hotz	69e1f3b551	remove vec2 from image in gater (#16165 ) * remove vec2 from image in gater * only simple idx * fix python with new image style * fix vconst * just vconst and stack * cast to int there * fix for const * fix process replay	2026-05-12 19:25:52 -07:00
chenyu	2172363be5	don't use Tensor indexing in svd (#16174 ) prepare mixin, also about 4X faster for 8x8 input	2026-05-12 21:56:19 -04:00
chenyu	420a08c6d1	qr to mixin (#16173 )	2026-05-12 21:23:25 -04:00
chenyu	c6a82fe927	functional qr and svd (#16172 ) no clone and setitem, will move to mixin next. slightly faster but still quite slow	2026-05-12 19:12:08 -04:00
Christopher Milan	3844a31f87	ci: untangle cuda/ocelot, less apt (#16171 ) * ci: untangle cuda/ocelot, less apt * ldconfig	2026-05-12 18:14:03 -04:00
Christopher Milan	316607f004	dsp: don't use docker in ci (#16167 ) * dsp: don't use docker in ci * add setup script for macos docker	2026-05-12 17:11:03 -04:00
chenyu	bdcdf1f1a1	jittable masked_select and nonzero (#16170 ) * jittable masked_select and nonzero make jittable with `size=`, matches jax * COMPILE_ONLY	2026-05-12 16:39:36 -04:00
wozeparrot	a613bcfc6d	allow after on contiguous in spec (#16169 ) * feat: allow after on contiguous * feat: add test	2026-05-12 13:11:44 -07:00
chenyu	7c3e3fa154	fix empty input for masked_select and nonzero (#16168 )	2026-05-12 15:36:51 -04:00
chenyu	da3b7e89a4	atol in test_custom_kernel_multi_output_backward_interacting (#16166 )	2026-05-12 14:42:12 -04:00
chenyu	25583f6dc1	fix cumsum dtype for 0d input (#16164 )	2026-05-12 14:18:08 -04:00
George Hotz	64c81dfd24	add all codegen stages to spec_tensor (#16163 )	2026-05-12 10:35:38 -07:00
chenyu	f3e3c3851f	explicit args to Tensor.rand (#16161 ) added requires_grad, other kwargs were silently dropped	2026-05-12 12:53:39 -04:00
nimlgen	e93fb5f9b9	hcq2: remove hcqprogram (#16157 ) * hcq2 rm program * nonbeauty * no prog * tiny * f * x	2026-05-12 18:49:13 +03:00
nimlgen	a708542308	fix ci spec (#16156 )	2026-05-12 17:57:11 +03:00
nimlgen	e5729935c6	time_call (#16152 ) * time_call * x * fix caches	2026-05-12 16:58:28 +03:00
qazal	fe39cf148a	add Ops.SOURCE test (#16155 ) * simple failing test * raises * change	2026-05-12 22:49:32 +09:00
qazal	5cd0494b14	viz: canonicalize ast for schedule to codegen linking (#16154 ) * simple failing test * always null device * viz: canonicalize ast for schedule to codegen linking * SCACHE	2026-05-12 22:40:21 +09:00
qazal	c1d125ff3b	llm: add markers to --benchmark (#16153 ) * markers in llm * ui fix	2026-05-12 20:14:11 +09:00
wozeparrot	e9359d9e7d	more llama mp fixes (#16151 ) * llama: SPLIT_W13 * llama: fix with no fused kernels * llama: cast to bf16 on non asm_gemm patH * llama: new mp flags	2026-05-11 21:29:23 -07:00
chenyu	09fd80fba6	fix randperm and _multi_like drop requires_grad (#16150 )	2026-05-11 23:23:34 -04:00
George Hotz	8294d105a7	Update the spec in spec.py to match the current state (#16132 ) * start work on specv2 * more spec * more spec * fix amd emulator * more spec * more * fix test_uop_graph * move those * spec=2 * skip those questionable tests * ptx fix * more spec=2 * store * allow custom function in tensor * spec 2 * fix beam search for tensor cores * delete the old specs * fix import	2026-05-11 20:07:47 -07:00
chenyu	3942a80f66	fix wrong kwargs passed into rands (#16149 ) working towards explicit args for these	2026-05-11 22:22:06 -04:00
Christopher Milan	039d84ff02	Revert "onnx: deduplicate simple proto parsers" (#16148 ) This reverts commit `83eaefcd0f`.	2026-05-11 21:45:17 -04:00
Christopher Milan	20f587d5d5	nv: rm _download (#16147 )	2026-05-11 19:56:37 -04:00
chenyu	371ab2023f	clean up image_dot and image_conv2d (#16145 )	2026-05-11 19:37:58 -04:00
Vikram Rangarajan	effa263865	Torch backend `aten::cat.out` fix (#16121 ) * Handle empty 1D tensors in cat_out * Undid other changes * Fixed torch cat * Improved cat.out, added more tests * Cleaned code * Type hinted dim * Removed whitespace	2026-05-11 16:28:16 -07:00

1 2 3 4 5 ...

13294 Commits