tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-15 01:15:49 +08:00

Author	SHA1	Message	Date
wozeparrot	b45edeb965	fix: rand supports large tensors (#15329 )	2026-03-17 15:45:41 -07:00
wozeparrot	674c760974	embedded bwd vocab shard (#15001 ) * fix: remove more multi from call * feat: embedding bwd vocab sharding * clean: unused import * clean: don't actually need this pattern	2026-03-16 19:37:16 -07:00
qazal	33bd33e783	sqtt: add CDNA ops enum, show in viz (#15140 )	2026-03-17 09:38:42 +09:00
qazal	5cd1daa3bc	cdna asm_gemm in one file, remove old rdna3 asm (#15281 )	2026-03-16 04:32:30 +09:00
chenyu	842c978df3	remove staticmethod dtypes.max/min (#15227 ) always use x.dtype.max/min	2026-03-11 23:11:24 -04:00
b1tg	18dc77ccab	add fp8 fnuz dtypes with PYTHON backend support (#14945 ) * add fp8 fnuz dtypes with PYTHON backend support * rm emu related change * clarify fp8 fnuz zero handling * Revert "rm emu related change" This reverts commit `efa4763c22`. --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2026-03-11 22:30:18 -04:00
Christopher Milan	25d86ec9e1	start using Invalid in image_conv2d (#15208 )	2026-03-10 07:11:06 -04:00
Christopher Milan	7810be8d3c	compile QCOM without opening device (#15165 ) Co-authored-by: Comma Device <device@comma.ai>	2026-03-06 06:24:27 -05:00
Roelof van Dijk	d65923bda5	tensor.py: add normalize function (#15159 ) * tensor.py: add normalize function * p==0 should match torch	2026-03-05 18:55:53 +08:00
chenyu	fae400d300	update assign tests to also test the expected behavior (#15132 )	2026-03-04 11:34:43 -05:00
nimlgen	563d5c3211	more graph tests (#15130 )	2026-03-04 19:01:12 +03:00
Christopher Milan	592f9bf6c6	set OPENPILOT_HACKS=1 to enable replace assign (#15123 )	2026-03-04 05:26:04 -05:00
George Hotz	2d72a4a90c	fix copying padded const (#15116 ) * fix const padding cpu * remove comment	2026-03-04 10:39:45 +08:00
wozeparrot	c35de9bd68	asm_gemm: support more sharding (#15002 )	2026-03-02 23:16:37 -08:00
Christopher Milan	c70e8af068	move IMAGE FLOAT16 logic to allocations (#15095 ) * FLOAT16 logic in allocations * cleanup * separate that * only apply when IMAGE == 1 * test passing now * create image buffers earlier	2026-03-02 22:00:05 -05:00
George Hotz	d483e4153a	buffer view is like buffer (#15082 ) * buffer view is like buffer * fix * swap_reshape_shrink * contiguous on gguf, fix overlap * revert that * _device_supports_view * this * fix that test * 0 buffers * that test was wrong * this * check correct size * contig BUFFER_VIEW * this * fix tests * buffer view tests * om * fix torch * no MOCKGPU * skip	2026-03-03 09:52:33 +08:00
Christopher Milan	977c270774	IMAGE=1 kernel count failing tests (#15083 )	2026-03-02 04:35:26 -05:00
George Hotz	3539693555	Support triu variable on diagonal + SDPA symbolic (#15081 ) * triu variable * fails * dumbbb * no commutative in reshape * real fix * revert that * sdpa symbolic tests	2026-03-02 12:19:48 +08:00
Nick	8e8e9f6ff6	assert removal for _tri() + tests (#15073 ) * assert removal for _tri() and tests * removed import * tests triu/tril like in prefill --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-03-02 10:34:28 +08:00
chenyu	151608aa90	update test_multiple_to_single_device (#15056 ) follow up to #14482, add SCACHE=0 to the test	2026-02-27 21:44:33 -05:00
chenyu	5fd06f4f02	differentiable setitem (#15054 ) * differentiable setitem go through the where path for bw * no return	2026-02-27 17:25:15 -05:00
chenyu	db6b3e1edc	fix mixed setitem with both basic and tensor indexing (#15050 )	2026-02-27 15:35:48 -05:00
chenyu	1406d49eef	failed test cases for advanced setitem (#15048 )	2026-02-27 10:50:18 -05:00
chenyu	0f94a4bb73	failed test case for early fixup const copy (#15038 ) * failed test case for early fixup const copy wrong with PAD * test no copy	2026-02-26 19:09:33 -05:00
chenyu	3a4db53b43	raise RuntimeError in schedule for conflicted var_val [pr] (#15031 )	2026-02-26 15:16:01 -05:00
chenyu	127136421d	enable a few WEBGPU isnan tests that work now (#14967 ) * enable a few WEBGPU isnan tests that work now * still failed	2026-02-23 11:06:08 -05:00
ttomsa	0366474089	Bool cast to cmpne (#14544 ) * test * rm in llvmir * rm in ptx and nir * hmmmm * rm in decompositions * skip tests * add test * just this * rm comment --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2026-02-23 10:31:36 -05:00
George Hotz	b824490e3f	allocate generates a call (#14958 ) * allocate generates a call * symbolic works too * DEFINE_VAR is param * replace param later * apply buffers * name * upd * this was a bug...	2026-02-23 15:59:20 +08:00
George Hotz	677145b393	all consts have shapes (#14959 ) * all consts have shapes * vconst has shape too * use normal schedule * cast ptrdtype * image * bitcast issue + hack	2026-02-23 10:26:50 +08:00
chenyu	4424757b9a	update test_sharded_memory (#14956 ) cleaned up and moved to test/null	2026-02-22 16:56:08 -05:00
b1tg	f9b7493e7a	cleanup fp8 conversion helpers and fp8 edge-case tests (#14953 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2026-02-22 09:16:42 -05:00
chenyu	0255a64a27	update test_jit_init_empty (#14938 ) * update test_jit_init_empty now it fails silently * that	2026-02-21 09:01:50 -05:00
George Hotz	8ef5544e4a	realized PYTHON copies (#14934 ) * realized PYTHON copies * comment that out * fix that test * append afters * contig * disk copies * should be 124 * 332	2026-02-21 20:29:31 +08:00
qazal	8278886cf9	test_profiler cleanup, non flaky cpu_profile test (#14932 ) * test_profiler cleanup, non flaky cpu_profile test * existing device is okay	2026-02-21 16:58:10 +09:00
qazal	c5029fa460	jit case with Tensor.empty input, realized means allocated (#14930 ) * simple failing jit test case with Tensor.empty * this used to exist in ops.py... * Revert "removed if self.buffer.is_allocated() in realized (#14836)" This reverts commit `72cf603805`.	2026-02-21 16:33:55 +09:00
qazal	5b6fcd1cda	gemm/asm: smallest cdna4 asm gemm test (#14925 )	2026-02-21 11:56:05 +09:00
George Hotz	df7774661a	remove late numbering of UOps (#14923 ) * remove late numbering of UOps * stupid fix * dead code	2026-02-21 09:18:48 +08:00
chenyu	24286c5593	fix clone for multi (#14919 ) also update empty_like to make sure it's backed by buffers	2026-02-20 17:21:09 -05:00
Nicolas Pinto	aa905db7f7	ptx: use setp.neu for float CMPNE (#14805 ) * ptx: use setp.neu for float CMPNE * test ptx float CMPNE renders setp.neu * check NaN behavior, not grep ptx strings... * skip WEBGPU for test_cmpne_nan (Vulkan NaN behavior) --------- Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2026-02-20 16:11:04 -05:00
George Hotz	2611907afb	start ripping out old scheduler -- no maps (#14909 ) * start ripping out old scheduler -- no maps * no more metadata	2026-02-20 21:05:04 +08:00
George Hotz	55d3a5def9	preallocate all realized buffers (#14823 ) * preallocate all realized buffers * contiguous * work * comment that out * move to schedule * better * correct fix * just buffer * disk bufs * fixes disk tensor stuff * fix symbolic stuff * fix multi * 162 failures * bugfixes * don't check that anymore * fix schedule tests * mnist should be contiguious * type and buffer * fix tests * shrink axis correction * mypy fixes * tests skips * same 37 failures * dedup * no shrink in the graph * 29 failures * skips * fix custom kernel * fix training * those optimizations aren't supported currently * simpler * more correct * tests * 14 failures * works * fix that test * broken * 11 failures * only kernel counts left * fixes * all tests pass * remove tensor_map * op test * 200 -> 230 * test fixes * fixes * revert test_tiny thing * guard * revert that * test tiny passes * no contigs there * base realize back * Revert "no contigs there" This reverts commit `c45bb9fcfd`. * revert that * chop many assigns * 12 failures * fix tests * tests * apply after * pre-commit * remove old code * delete that * fix types * remove extra contig * fix dataloader * torch fix * disk fix * update kernel fusion numbres * runs on amd * restore kernel count * add that rule back * that * disable that * wrong * add the correct rule for that folding * more tests * guard c1.arg * no newlines * realize those * split into a different file * remove detach/contig back * skip 2 * update that	2026-02-20 20:05:54 +08:00
George Hotz	fc5677c28b	resnet dataloader + more test cleanups (#14899 ) * resnet dataloader * tests	2026-02-20 10:05:47 +08:00
chenyu	52f727738b	move test_grouped_dims to test/null (#14893 ) it's a pure helper	2026-02-19 14:50:53 -05:00
George Hotz	2f0f8b5776	more test relaxations from prealloc_bufs (#14880 )	2026-02-19 14:23:28 +08:00
George Hotz	ab61c16730	fixes and test relaxations from prealloc_bufs (#14875 ) * fixes and test relaxations from prealloc_bufs * fix error type and guard _mop * revert that * contiguous makes extra/torch_backend/test_kernel_fusion.py fail	2026-02-19 11:37:25 +08:00
chenyu	0c85b93938	support shink sharded and non-sharded axes (#14874 ) simpler to just support it	2026-02-18 20:54:10 -05:00
chenyu	8c830c5b44	test_full_like_shrink_on_shard_axis (#14870 ) * test_full_like_shrink_on_shard_axis add a test case that triggers non-copy branch in mstack_early_shrink * 0	2026-02-18 19:23:44 -05:00
chenyu	f771de6738	gc.collect() to get the correct GlobalCounters.mem_used in tests (#14868 ) test can be flaky if gc happens in between	2026-02-18 15:01:23 -05:00
chenyu	f84a11bb9f	delete uneven shard tests and mentions (#14867 )	2026-02-18 14:10:33 -05:00
George Hotz	af839b2bd1	remove all the outerworld stuff, it was too complex (#14852 )	2026-02-18 17:44:11 +08:00

1 2 3 4 5

226 Commits