tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-13 00:15:35 +08:00

Author	SHA1	Message	Date
George Hotz	0337a70a28	BufferSpec and ProgramSpec [pr]	2024-11-21 12:03:56 +08:00
George Hotz	9df5a62c5e	unify to HWQueue [pr] (#7812 ) * unify to HWCommandQueue [pr] * all is HWQueue	2024-11-21 10:33:08 +08:00
chenyu	11cea00090	lower vs_theoretical conv tflops threshold for nv (#7811 ) less flaky	2024-11-20 20:03:49 -05:00
ignaciosica	fc3154a7b3	metal bf16 tc support [pr] (#7408 ) * add bf16 tc for metal * hotfix: spacing * fix tolerance and skip metal bf16 in ci * hotfix: check for dtype_out * hotfix: add check for tc.dtype_out is bf16 back * hotfix: add parens	2024-11-20 14:39:08 -05:00
geohotstan	66a069ee25	add replicate mode to Tensor.pad (#7802 ) * base implementation * add tests * actually remove the assertionerror test * good	2024-11-20 08:39:58 -05:00
George Hotz	eb0bb7dc0b	final dname to device [pr] (#7806 ) * final dname to device [pr] * oops, fix nv	2024-11-20 20:20:28 +08:00
George Hotz	bc977fec53	dname -> device [pr] (#7804 ) * dname -> device [pr] * a few more * only one left	2024-11-20 17:57:14 +08:00
ttomsa	9adeb1041c	fix advanced setitem with 1 in shape (#7797 ) * fix advanced setitem with 1 in shape * linter	2024-11-19 20:04:59 -05:00
ttomsa	170ece6605	fix advanced setitem overlap with 0 (#7793 ) * fix advanced setitem overlap with 0 * fix comment	2024-11-19 16:03:55 -05:00
Gaétan Lepage	159c0bf25e	test_kernel_cache_in_action: fix test (#7792 )	2024-11-19 13:34:56 -05:00
Eitan Turok	56017c52a0	Raise error when model architecture does not match state dict (#7772 ) * init * style * style * style * fix test	2024-11-20 00:11:54 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
geohotstan	aeaf574a05	add failure test for setitem bug (#7786 ) * add failure test * rename * improve tests * improve tests and no need numpy	2024-11-19 08:54:21 -05:00
qazal	1e31b5ba6b	hotfix: ctx doesn't impact process replay [pr] (#7785 )	2024-11-19 20:17:01 +08:00
chenyu	26200574dc	load_state_dict test cases when model and data shard differently (#7774 ) current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change. but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load. it should either always use model shard axis or always use state_dict shard	2024-11-18 16:08:24 -05:00
Francis Lata	a1c1b9547f	Context manager support for tqdm (#7770 ) * add context manager support * add test case for context manager usage	2024-11-18 14:12:03 -05:00
geohotstan	8100109c9d	Add replicate mode to Tensor.pad (#7608 ) * base implementation * add tests * actually remove the assertionerror test * actually only have reflect for this pr * change the 4 if-else one liner * maybe use a lambda * fix * maybe a lil cleaner * fix tests * complete * small change --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-18 10:55:38 -05:00
chenyu	66d7d5af50	fix Tensor(MultiLazyBuffer) with different dtype should fail (#7757 ) similar to Tensor(LazyBuffer) as we don't cast implicitly	2024-11-17 21:05:45 -05:00
chenyu	df817297b6	fix passing acc_dtype="" to Tensor.prod should fail (#7750 ) similar to sum	2024-11-17 11:38:13 -05:00
chenyu	55707fd00d	fix passing sum_acc_dtype="" to Tensor.sum should fail (#7748 )	2024-11-17 10:58:41 -05:00
qazal	99024b922b	to_uop one path for all ops part 1 (#7745 ) * flat meta ops * one path for everything * add tests * view is always base * just run	2024-11-17 20:12:44 +08:00
chenyu	a15a900415	fix Tensor.meshgrid for 1D input and check indexing (#7740 )	2024-11-16 23:39:30 -05:00
geohotstan	72a41095bc	add Tensor.meshgrid (#7714 ) * initial implementation and test * some other places that can use meshgrid * revert the onnx_ops change * add to docs * revert interpolate too * update * improve edge case test * might as well test grad * add to test can improve docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-16 23:06:47 -05:00
chenyu	f1efd84c92	fix repeat_interleave with negative dim (#7734 )	2024-11-16 10:15:29 -05:00
chenyu	e3105675fb	cond.where(True, False) is cond (#7733 )	2024-11-16 09:44:17 -05:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
chenyu	22da31b223	clean up Tensor.dot (#7728 ) more docs (similar to numpy) and removed many confusing `-min(n2, 2)`	2024-11-15 18:21:15 -05:00
chenyu	4338c450ac	fix max_pool2d for int tensor with padding (#7726 ) padding inf messed output dtype	2024-11-15 16:22:11 -05:00
chenyu	aeb1301bab	enable a few tests that work now (#7721 ) should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work	2024-11-15 14:30:52 -05:00
qazal	bddee26114	Ops.VALID cleanup, move recursive tests [pr] (#7713 )	2024-11-15 20:22:46 +08:00
qazal	703a255301	use the method_cache in test_schedule [pr] (#7712 ) * use the method_cache in test_schedule [pr] * need half	2024-11-15 19:20:47 +08:00
qazal	88f760cc32	test_two_sum doesn't need del (#7711 )	2024-11-15 18:50:08 +08:00
George Hotz	9b1605eef9	Revert "objdump intel syntax (#7605 )" (#7707 ) This reverts commit `8f8e375f27`.	2024-11-15 12:13:04 +08:00
ttomsa	8f8e375f27	objdump intel syntax (#7605 ) * objdump intel syntax * test for objdump intel syntax * add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump * linter	2024-11-15 11:32:23 +08:00
chenyu	9fb396f660	test_ops maxpool2d -> max_pool2d (#7696 ) and avgpool2d -> avg_pool2d for better grepping the tests	2024-11-14 10:39:12 -05:00
geohotstan	f8056a74d6	combine pad2d with pad (#7677 ) * I have pad2d, I have pad, uuh~, pad2dpad~ * fix some small things * strategically placed cast hack * fix more * fix more more * tests * periods	2024-11-14 17:56:02 +08:00
qazal	0914c2fec9	add TestLinearizerFailures test_failure_56 and test_failure_57 (#7682 ) * add test_failure_56 and test_failure_57 * so it's only METAL=1	2024-11-14 12:00:33 +08:00
chenyu	333f5f9f8b	Tensor.bitwise_not (#7688 ) implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31	2024-11-13 16:31:52 -05:00
chenyu	fb933b79a6	add test case for nll_loss with input > 2D (#7685 ) * failed test case for nll_loss with input > 2D * fixed * add more	2024-11-13 14:34:07 -05:00
geohotstan	9c41c376d3	add Tensor.nll_loss (#7683 ) * move nll_loss to new branch * make nll_loss examples practical * self is * add to docs * small	2024-11-13 13:12:13 -05:00
chenyu	3c6fe4b79a	fix Tensor.bitwise_and and Tensor.bitwise_or to support bool (#7684 )	2024-11-13 13:10:39 -05:00
chenyu	3d82f8e340	simpler rand_like (#7680 )	2024-11-13 12:28:41 -05:00
James	d4e4a084a1	fix: Tensor min function for unsigned ints (#7675 ) * add failing tests for uint8 `min()` * fix unsigned data type min() * fix test data * fix whitespace --------- Co-authored-by: rezaarezvan <reza@rezvan.xyz> Co-authored-by: Jamesb <experimentallearning0@gmail.com>	2024-11-13 11:04:27 -05:00
chenyu	d1dfd598a2	assert specifying device to rand_like a multi tensor (#7678 ) * assert specifying device to rand_like a multi tensor raise RuntimeError instead of dropping it silently * fix that	2024-11-13 10:24:40 -05:00
chenyu	51432bfbff	add rand_like test case with device specified (#7663 ) in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like	2024-11-13 09:32:55 -05:00
Reza Rezvan	23363dee55	Add: failing tests for uint8 `min()` (#7669 ) * add failing tests for uint8 `min()` * mark as expected failure	2024-11-13 22:12:53 +08:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
chenyu	962dafb467	use randn in speed_v_theoretical instead of rand (#7656 ) * use randn in speed_v_theoretical instead of rand this made green gemv 20% faster... but why? * update threshold	2024-11-12 15:00:32 -05:00
chenyu	6159790ab8	add gemv to speed_v_theoretical (#7654 ) * add gemv to speed_v_theoretical getting ~300GB/s if we just count the memory of inputs and output * better green numbers * flip	2024-11-12 11:19:35 -05:00

1 2 3 4 5 ...

2902 Commits