Francis Lam
c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul ( #4199 )
...
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b
Fuzz all permutations of schedule ( #4136 )
...
* simple toposort
* fuzzer
* init in_degree
* move to tests
* same seed
* configure paths
* internal graph
* compare LazyBuffers
* simpler
* simple graph
* assign works
* simpler
* fix JIT
* upstream ci
* move ci
* fix the path
* DEBUG=1
* limit max paths
* launch a cmp kernel
* Revert "launch a cmp kernel"
This reverts commit 791c608992 .
* exec ground truth
* better perf
* copy ground truth once
* gpu allclose ast try1
* Revert "gpu allclose ast try1"
This reverts commit 1f82103af3 .
* prerealized bufs freezing
* teeny cleanups
* reuse Buffers
* Revert "reuse Buffers"
This reverts commit a71de94b03 .
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-17 05:03:21 +04:00
David Hou
97d846dd67
in forced_realize, unchase last op if it is upcast ( #4185 )
...
* in forced_realize, unchase last op if it is upcast
* start on test
* flesh out test
* more test
* comment
* comment out parallel reduce test
* reorder
* unused
2024-04-16 17:15:17 -04:00
Francis Lam
e9c1616b27
logging: change LOGKERN to LOGKERNS to match LOGOPS ( #4193 )
...
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567
touchup resnet_layer_bench ( #4191 )
2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19
Benchmarks for individual resnet layers ( #4182 )
...
* resnet individual layer benchmarks!
* small
* 1 and 2
* mem_used
* no ci
* better conv print
* defaults
* prints
* adjust
* adjust
* adjust
* benchmark only one layer example
* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count
* default jitcnt=1
* scale flops/kernels with jitcnt
* add note about jitcnt memory
* touchup
2024-04-16 13:53:18 -04:00
George Hotz
55ae73e951
Replicate llm.c in tinygrad ( #4179 )
...
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* test tolist
* simple fix for onnx test failures (#4186 )
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* bump line count to 7500
* simplest fix
* safenumpy tolist for now
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
---------
Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com >
2024-04-16 15:40:48 +04:00
George Hotz
b6e7243bfa
hotfix: skip slow pre-commit test
2024-04-16 11:48:43 +04:00
George Hotz
50e780a588
multitensor shouldn't recompile ( #4164 )
...
* multitensor shouldn't recompile
* type annotations
* fix tests
* outcount in reduce
2024-04-13 00:03:48 -07:00
George Hotz
ba7314c26b
cleanup lbs ( #4163 )
2024-04-12 22:32:16 -07:00
chenyu
a7c6864260
remove CAST_BEFORE_VIEW ( #4152 )
...
* remove CAST_BEFORE_VIEW
testing perf, also this might have issue with assign?
* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c
rewrite the jit in the context of new schedule ( #4162 )
...
* rewrite the jit in the context of new schedule
* mypy better
* fix placeholder
* tests
* all functionality should work
* fix tests
* no CacheCollector
2024-04-12 21:54:36 -07:00
chenyu
63eb0a68af
fix return dtype of gather ( #4159 )
2024-04-12 16:25:12 -04:00
chenyu
d9c5a2b1bb
fix return dtype of getitem Tensor indexing ( #4158 )
...
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu
f6c8032e5d
assert if expr_idxs return might be outside of int32 ( #4157 )
2024-04-12 14:18:35 -04:00
chenyu
380f27d629
move sum acc_dtype into lazy so it applies to backward ( #4149 )
...
* move sum acc_dtype into lazy so it applies to backward
* unit test
2024-04-11 14:43:56 -04:00
George Hotz
bbda20c0db
CompiledASTRunner -> CompiledRunner ( #4148 )
2024-04-11 08:49:52 -07:00
George Hotz
b7e281cf10
JitItem -> ExecItem ( #4146 )
...
* JitItem -> ExecItem
* execitem in realize
* cleaner
* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
chenyu
06bcae13b4
PADTO SUM if parents of sum are all zero-preserving ( #4140 )
...
* PADTO SUM if parents of sum are all zero-preserving
* test case unsafe ops after sum is fine
* reuse UNSAFE_PAD_OPS
* update db version
2024-04-10 22:16:12 -04:00
terafo
5e6d2155e4
Add driving monitoring model to benchmarks ( #4134 )
...
* add driving monitoring model to benchmarks
* handle crash
2024-04-10 14:27:03 -04:00
geohotstan
fe88591890
update onnx to 1.16.0 ( #4127 )
...
* update
* pass tests and skip tests
2024-04-10 11:19:13 -04:00
chenyu
6bbbeb93ac
skip a few clang test that took > 30 seconds in CI ( #4126 )
...
* skip slow CLANG test test_train_cifar
* skip those too
* and that
* only CI
* one more
2024-04-10 02:00:34 -04:00
qazal
42edae8935
pickle schedules ( #4114 )
...
* pickle schedules
* Update test_pickle.py
* Update test_pickle.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-09 13:47:25 -07:00
George Hotz
ae849d12d7
numpy device + pickle it ( #4120 )
2024-04-09 13:19:30 -07:00
David González Martínez
980124a605
add lerp operation to tensor ( #4102 )
...
* feat: add lerp operation to tensor
* fix
* style: fit in one line:
* tests: test backward for lerp
2024-04-08 17:03:27 -07:00
chenyu
dbd39ab78a
setitem support setting python const ( #4111 )
2024-04-08 11:37:50 -04:00
chenyu
92c0675ccf
setitem initial support ( #4093 )
...
* wip setitem
it's an eager assign to output shapetracker view
* cleanups and tests
* more cleanups
2024-04-07 20:35:22 -04:00
geohotstan
183708b3fd
broadcast expand to match torch ( #4085 )
...
* initial version
* heh gimme grrrreen
* version 2
* clean ups
* some test confusion
* fix onnx
* rename to _broadcast_tensors
* improved errors and test
* fixed?
* some test fixup
* version 3 lol
* comments
* cleaner
* add failure test for expand to 0 test
* 1 more assertRaises test
* make err msg better
* also rewrite the expand onnx op? :s
2024-04-07 16:23:13 -04:00
uuuvn
2b81d9b334
Fix broken test ( #4104 )
2024-04-07 12:02:12 -04:00
uuuvn
bb7567b365
Fix metal ( #4101 )
2024-04-07 05:21:19 -07:00
chenyu
bdbcac67f1
assign jit test case with other tensor as input ( #4098 )
...
hmm it works
2024-04-06 14:41:14 -04:00
George Hotz
164329a8ea
address kfd feedback ( #4087 )
...
* address kfd feedback
* signals cleanup
* signals cleanup
* handle 2 doorbell pages correctly
* signal reset cleanup
* signals cleanup
* more GTT
* cleanups
* minor cleanups
2024-04-05 15:24:41 -07:00
Akshit Talwar
750ecf8fef
replace slice by pad/shrink in _pool ( #4082 )
2024-04-05 11:47:22 -04:00
George Hotz
a337922c44
more work on kfd ( #4079 )
...
* more work on kfd
* fix multitensor test on kfd
* stuff
2024-04-05 08:36:36 -07:00
chenyu
e7ff5102cf
failed test in test_pattern_matcher ( #4080 )
...
something about the PTX rewrite is incorrect that it has duplicated rewritten uops
2024-04-05 02:53:50 -04:00
George Hotz
3de855ea50
don't use SVM memory in KFD ( #4072 )
...
* don't use SVM memory in KFD
* copy from fd
* cleanups
* transfer
* hacks
* ops_hsa
* tighter API
2024-04-04 17:33:21 -07:00
chenyu
c1cffed1df
add LazyOp.dtype ( #4073 )
...
an inferred cached_property.
removed all cases that use get_lazyop_info just to get the dtype of an op.
prereq to remove InterpretedFlopCounter
2024-04-04 17:38:19 -04:00
Szymon Ożóg
82b7b9655f
test for dtype set ( #4069 )
2024-04-04 11:24:33 -04:00
geohotstan
1a1dd1c1a7
add and enable tests for indexing const folding ( #4068 )
...
* enable test in test_indexing
* added tests
* rename stuff
* del a test case cuz it's loadops.copy
2024-04-04 10:46:28 -04:00
Szymon Ożóg
ba118abfec
improved caching for pointer arithmetics in ptx ( #3922 )
...
* improved caching for pointer arithmetics
* Add test for pointer arithmetics caching
* Refactor test
2024-04-04 07:33:48 -07:00
George Hotz
7181ffd630
HWCopyQueue in KFD ( #4042 )
...
* HWCopyQueue in KFD
* hw compute queue
* test
* move test
* more tests
* fix wait
* fix multimap
* mes crash
* tests pass but slow
* stuff is working
* one more test
2024-04-03 20:14:24 -07:00
chenyu
e3c0ac9fbf
remove old envvar "OPT" ( #4060 )
2024-04-03 14:55:21 -04:00
chenyu
406cb5fd90
const fold ReduceOps ( #4059 )
2024-04-03 14:39:28 -04:00
chenyu
fe03725b21
const fold cast unrealized_unpadded_const ( #4047 )
...
* const fold unrealized_unpadded_const
changed the underlying arg directly
* CAST_BEFORE_VIEW folds some
* fix const index in getitem
2024-04-03 12:31:24 -04:00
Szymon Ożóg
e5a9bff899
Add pattern matcher tests, move uop transforms from assembly to pattern ( #4056 )
...
matcher
2024-04-03 09:06:43 -07:00
chenyu
f61ed869f5
Use exec_alu for lazy const folding ( #4039 )
2024-04-02 20:52:05 -04:00
chenyu
85edc493b0
uops const fold rules to prevent tautological compare warnings ( #4041 )
...
* uops const fold rules to prevent tautological compare warnings
`bool < false` is false, `true < bool` is false, `a == a` is true, `a != a` is false
* not true for nan
* and nan does not work with llvm
* full truth table test
* revert a==a
* comments and indents
2024-04-02 16:45:58 -04:00
Patrick Tsai
0147174ad6
Embedding in one kernel ( #4036 )
...
* Embedding is in one kernel
* embedding is one kernel
* rm extra line
* newline
* bert test counts state vars?
* add a test?
* move items around
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-04-02 11:38:21 -04:00
Dan Hoffman
5311b45053
re-enable has_local check for linearizer test ( #4034 )
...
Co-authored-by: Dan Hoffman <daniel.hoffman@intel.com >
2024-04-02 00:02:03 -04:00
George Hotz
7425a0c646
CommandQueue is the future ( #3950 )
...
* start of command queue
* cq work
* runs
* cleanup
* outs set
* read is gone
* future buffer work
* command queue is better
* command queue works
* loadops
* delete unneeded
* command queue works
* upd
* fix tests
* use CommandQueue in compile
* delay sync
2024-04-01 17:35:48 -07:00