George Hotz
0337a70a28
BufferSpec and ProgramSpec [pr]
2024-11-21 12:03:56 +08:00
George Hotz
9df5a62c5e
unify to HWQueue [pr] ( #7812 )
...
* unify to HWCommandQueue [pr]
* all is HWQueue
2024-11-21 10:33:08 +08:00
chenyu
11cea00090
lower vs_theoretical conv tflops threshold for nv ( #7811 )
...
less flaky
2024-11-20 20:03:49 -05:00
ignaciosica
fc3154a7b3
metal bf16 tc support [pr] ( #7408 )
...
* add bf16 tc for metal
* hotfix: spacing
* fix tolerance and skip metal bf16 in ci
* hotfix: check for dtype_out
* hotfix: add check for tc.dtype_out is bf16 back
* hotfix: add parens
2024-11-20 14:39:08 -05:00
geohotstan
66a069ee25
add replicate mode to Tensor.pad ( #7802 )
...
* base implementation
* add tests
* actually remove the assertionerror test
* good
2024-11-20 08:39:58 -05:00
George Hotz
eb0bb7dc0b
final dname to device [pr] ( #7806 )
...
* final dname to device [pr]
* oops, fix nv
2024-11-20 20:20:28 +08:00
George Hotz
bc977fec53
dname -> device [pr] ( #7804 )
...
* dname -> device [pr]
* a few more
* only one left
2024-11-20 17:57:14 +08:00
ttomsa
9adeb1041c
fix advanced setitem with 1 in shape ( #7797 )
...
* fix advanced setitem with 1 in shape
* linter
2024-11-19 20:04:59 -05:00
ttomsa
170ece6605
fix advanced setitem overlap with 0 ( #7793 )
...
* fix advanced setitem overlap with 0
* fix comment
2024-11-19 16:03:55 -05:00
Gaétan Lepage
159c0bf25e
test_kernel_cache_in_action: fix test ( #7792 )
2024-11-19 13:34:56 -05:00
Eitan Turok
56017c52a0
Raise error when model architecture does not match state dict ( #7772 )
...
* init
* style
* style
* style
* fix test
2024-11-20 00:11:54 +08:00
George Hotz
d71fe7faa5
rename allocator methods to not conflict [pr] ( #7788 )
...
* rename allocator methods to not conflict [pr]
* forgot those
* transfer + offset
2024-11-20 00:10:29 +08:00
geohotstan
aeaf574a05
add failure test for setitem bug ( #7786 )
...
* add failure test
* rename
* improve tests
* improve tests and no need numpy
2024-11-19 08:54:21 -05:00
qazal
1e31b5ba6b
hotfix: ctx doesn't impact process replay [pr] ( #7785 )
2024-11-19 20:17:01 +08:00
chenyu
26200574dc
load_state_dict test cases when model and data shard differently ( #7774 )
...
current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change.
but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load.
it should either always use model shard axis or always use state_dict shard
2024-11-18 16:08:24 -05:00
Francis Lata
a1c1b9547f
Context manager support for tqdm ( #7770 )
...
* add context manager support
* add test case for context manager usage
2024-11-18 14:12:03 -05:00
geohotstan
8100109c9d
Add replicate mode to Tensor.pad ( #7608 )
...
* base implementation
* add tests
* actually remove the assertionerror test
* actually only have reflect for this pr
* change the 4 if-else one liner
* maybe use a lambda
* fix
* maybe a lil cleaner
* fix tests
* complete
* small change
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-18 10:55:38 -05:00
chenyu
66d7d5af50
fix Tensor(MultiLazyBuffer) with different dtype should fail ( #7757 )
...
similar to Tensor(LazyBuffer) as we don't cast implicitly
2024-11-17 21:05:45 -05:00
chenyu
df817297b6
fix passing acc_dtype="" to Tensor.prod should fail ( #7750 )
...
similar to sum
2024-11-17 11:38:13 -05:00
chenyu
55707fd00d
fix passing sum_acc_dtype="" to Tensor.sum should fail ( #7748 )
2024-11-17 10:58:41 -05:00
qazal
99024b922b
to_uop one path for all ops part 1 ( #7745 )
...
* flat meta ops
* one path for everything
* add tests
* view is always base
* just run
2024-11-17 20:12:44 +08:00
chenyu
a15a900415
fix Tensor.meshgrid for 1D input and check indexing ( #7740 )
2024-11-16 23:39:30 -05:00
geohotstan
72a41095bc
add Tensor.meshgrid ( #7714 )
...
* initial implementation and test
* some other places that can use meshgrid
* revert the onnx_ops change
* add to docs
* revert interpolate too
* update
* improve edge case test
* might as well test grad
* add to test can improve docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-16 23:06:47 -05:00
chenyu
f1efd84c92
fix repeat_interleave with negative dim ( #7734 )
2024-11-16 10:15:29 -05:00
chenyu
e3105675fb
cond.where(True, False) is cond ( #7733 )
2024-11-16 09:44:17 -05:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
chenyu
22da31b223
clean up Tensor.dot ( #7728 )
...
more docs (similar to numpy) and removed many confusing `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac
fix max_pool2d for int tensor with padding ( #7726 )
...
padding inf messed output dtype
2024-11-15 16:22:11 -05:00
chenyu
aeb1301bab
enable a few tests that work now ( #7721 )
...
should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work
2024-11-15 14:30:52 -05:00
qazal
bddee26114
Ops.VALID cleanup, move recursive tests [pr] ( #7713 )
2024-11-15 20:22:46 +08:00
qazal
703a255301
use the method_cache in test_schedule [pr] ( #7712 )
...
* use the method_cache in test_schedule [pr]
* need half
2024-11-15 19:20:47 +08:00
qazal
88f760cc32
test_two_sum doesn't need del ( #7711 )
2024-11-15 18:50:08 +08:00
George Hotz
9b1605eef9
Revert "objdump intel syntax ( #7605 )" ( #7707 )
...
This reverts commit 8f8e375f27 .
2024-11-15 12:13:04 +08:00
ttomsa
8f8e375f27
objdump intel syntax ( #7605 )
...
* objdump intel syntax
* test for objdump intel syntax
* add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump
* linter
2024-11-15 11:32:23 +08:00
chenyu
9fb396f660
test_ops maxpool2d -> max_pool2d ( #7696 )
...
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
geohotstan
f8056a74d6
combine pad2d with pad ( #7677 )
...
* I have pad2d, I have pad, uuh~, pad2dpad~
* fix some small things
* strategically placed cast hack
* fix more
* fix more more
* tests
* periods
2024-11-14 17:56:02 +08:00
qazal
0914c2fec9
add TestLinearizerFailures test_failure_56 and test_failure_57 ( #7682 )
...
* add test_failure_56 and test_failure_57
* so it's only METAL=1
2024-11-14 12:00:33 +08:00
chenyu
333f5f9f8b
Tensor.bitwise_not ( #7688 )
...
implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31
2024-11-13 16:31:52 -05:00
chenyu
fb933b79a6
add test case for nll_loss with input > 2D ( #7685 )
...
* failed test case for nll_loss with input > 2D
* fixed
* add more
2024-11-13 14:34:07 -05:00
geohotstan
9c41c376d3
add Tensor.nll_loss ( #7683 )
...
* move nll_loss to new branch
* make nll_loss examples practical
* self *is*
* add to docs
* small
2024-11-13 13:12:13 -05:00
chenyu
3c6fe4b79a
fix Tensor.bitwise_and and Tensor.bitwise_or to support bool ( #7684 )
2024-11-13 13:10:39 -05:00
chenyu
3d82f8e340
simpler rand_like ( #7680 )
2024-11-13 12:28:41 -05:00
James
d4e4a084a1
fix: Tensor min function for unsigned ints ( #7675 )
...
* add failing tests for uint8 `min()`
* fix unsigned data type min()
* fix test data
* fix whitespace
---------
Co-authored-by: rezaarezvan <reza@rezvan.xyz >
Co-authored-by: Jamesb <experimentallearning0@gmail.com >
2024-11-13 11:04:27 -05:00
chenyu
d1dfd598a2
assert specifying device to rand_like a multi tensor ( #7678 )
...
* assert specifying device to rand_like a multi tensor
raise RuntimeError instead of dropping it silently
* fix that
2024-11-13 10:24:40 -05:00
chenyu
51432bfbff
add rand_like test case with device specified ( #7663 )
...
in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like
2024-11-13 09:32:55 -05:00
Reza Rezvan
23363dee55
Add: failing tests for uint8 min() ( #7669 )
...
* add failing tests for uint8 `min()`
* mark as expected failure
2024-11-13 22:12:53 +08:00
qazal
e84d089ef1
delete ReduceOps, only use REDUCE_AXIS ( #7667 )
2024-11-13 19:04:27 +08:00
chenyu
1884f021e3
add conv3x3 to speed_v_theoretical ( #7658 )
...
* add conv3x3 to speed_v_theoretical
* show test duration
2024-11-12 16:41:56 -05:00
chenyu
962dafb467
use randn in speed_v_theoretical instead of rand ( #7656 )
...
* use randn in speed_v_theoretical instead of rand
this made green gemv 20% faster... but why?
* update threshold
2024-11-12 15:00:32 -05:00
chenyu
6159790ab8
add gemv to speed_v_theoretical ( #7654 )
...
* add gemv to speed_v_theoretical
getting ~300GB/s if we just count the memory of inputs and output
* better green numbers
* flip
2024-11-12 11:19:35 -05:00