Commit Graph

1652 Commits

Author SHA1 Message Date
Francis Lam
c91b7b1739 test: add fuzz_matmul and better debugging for simple_matmul (#4199)
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b Fuzz all permutations of schedule (#4136)
* simple toposort

* fuzzer

* init in_degree

* move to tests

* same seed

* configure paths

* internal graph

* compare LazyBuffers

* simpler

* simple graph

* assign works

* simpler

* fix JIT

* upstream ci

* move ci

* fix the path

* DEBUG=1

* limit max paths

* launch a cmp kernel

* Revert "launch a cmp kernel"

This reverts commit 791c608992.

* exec ground truth

* better perf

* copy ground truth once

* gpu allclose ast try1

* Revert "gpu allclose ast try1"

This reverts commit 1f82103af3.

* prerealized bufs freezing

* teeny cleanups

* reuse Buffers

* Revert "reuse Buffers"

This reverts commit a71de94b03.

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-17 05:03:21 +04:00
David Hou
97d846dd67 in forced_realize, unchase last op if it is upcast (#4185)
* in forced_realize, unchase last op if it is upcast

* start on test

* flesh out test

* more test

* comment

* comment out parallel reduce test

* reorder

* unused
2024-04-16 17:15:17 -04:00
Francis Lam
e9c1616b27 logging: change LOGKERN to LOGKERNS to match LOGOPS (#4193)
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567 touchup resnet_layer_bench (#4191) 2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19 Benchmarks for individual resnet layers (#4182)
* resnet individual layer benchmarks!

* small

* 1 and 2

* mem_used

* no ci

* better conv print

* defaults

* prints

* adjust

* adjust

* adjust

* benchmark only one layer example

* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count

* default jitcnt=1

* scale flops/kernels with jitcnt

* add note about jitcnt memory

* touchup
2024-04-16 13:53:18 -04:00
George Hotz
55ae73e951 Replicate llm.c in tinygrad (#4179)
* write llm.c and add a few new methods to tensor

* training works

* add jit

* tests for new functions

* test tolist

* simple fix for onnx test failures (#4186)

* write llm.c and add a few new methods to tensor

* training works

* add jit

* tests for new functions

* bump line count to 7500

* simplest fix

* safenumpy tolist for now

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>

---------

Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>
2024-04-16 15:40:48 +04:00
George Hotz
b6e7243bfa hotfix: skip slow pre-commit test 2024-04-16 11:48:43 +04:00
George Hotz
50e780a588 multitensor shouldn't recompile (#4164)
* multitensor shouldn't recompile

* type annotations

* fix tests

* outcount in reduce
2024-04-13 00:03:48 -07:00
George Hotz
ba7314c26b cleanup lbs (#4163) 2024-04-12 22:32:16 -07:00
chenyu
a7c6864260 remove CAST_BEFORE_VIEW (#4152)
* remove CAST_BEFORE_VIEW

testing perf, also this might have issue with assign?

* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c rewrite the jit in the context of new schedule (#4162)
* rewrite the jit in the context of new schedule

* mypy better

* fix placeholder

* tests

* all functionality should work

* fix tests

* no CacheCollector
2024-04-12 21:54:36 -07:00
chenyu
63eb0a68af fix return dtype of gather (#4159) 2024-04-12 16:25:12 -04:00
chenyu
d9c5a2b1bb fix return dtype of getitem Tensor indexing (#4158)
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu
f6c8032e5d assert if expr_idxs return might be outside of int32 (#4157) 2024-04-12 14:18:35 -04:00
chenyu
380f27d629 move sum acc_dtype into lazy so it applies to backward (#4149)
* move sum acc_dtype into lazy so it applies to backward

* unit test
2024-04-11 14:43:56 -04:00
George Hotz
bbda20c0db CompiledASTRunner -> CompiledRunner (#4148) 2024-04-11 08:49:52 -07:00
George Hotz
b7e281cf10 JitItem -> ExecItem (#4146)
* JitItem -> ExecItem

* execitem in realize

* cleaner

* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
chenyu
06bcae13b4 PADTO SUM if parents of sum are all zero-preserving (#4140)
* PADTO SUM if parents of sum are all zero-preserving

* test case unsafe ops after sum is fine

* reuse UNSAFE_PAD_OPS

* update db version
2024-04-10 22:16:12 -04:00
terafo
5e6d2155e4 Add driving monitoring model to benchmarks (#4134)
* add driving monitoring model to benchmarks

* handle crash
2024-04-10 14:27:03 -04:00
geohotstan
fe88591890 update onnx to 1.16.0 (#4127)
* update

* pass tests and skip tests
2024-04-10 11:19:13 -04:00
chenyu
6bbbeb93ac skip a few clang test that took > 30 seconds in CI (#4126)
* skip slow CLANG test test_train_cifar

* skip those too

* and that

* only CI

* one more
2024-04-10 02:00:34 -04:00
qazal
42edae8935 pickle schedules (#4114)
* pickle schedules

* Update test_pickle.py

* Update test_pickle.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-09 13:47:25 -07:00
George Hotz
ae849d12d7 numpy device + pickle it (#4120) 2024-04-09 13:19:30 -07:00
David González Martínez
980124a605 add lerp operation to tensor (#4102)
* feat: add lerp operation to tensor

* fix

* style: fit in one line:

* tests: test backward for lerp
2024-04-08 17:03:27 -07:00
chenyu
dbd39ab78a setitem support setting python const (#4111) 2024-04-08 11:37:50 -04:00
chenyu
92c0675ccf setitem initial support (#4093)
* wip setitem

it's an eager assign to output shapetracker view

* cleanups and tests

* more cleanups
2024-04-07 20:35:22 -04:00
geohotstan
183708b3fd broadcast expand to match torch (#4085)
* initial version

* heh gimme grrrreen

* version 2

* clean ups

* some test confusion

* fix onnx

* rename to _broadcast_tensors

* improved errors and test

* fixed?

* some test fixup

* version 3 lol

* comments

* cleaner

* add failure test for expand to 0 test

* 1 more assertRaises test

* make err msg better

* also rewrite the expand onnx op? :s
2024-04-07 16:23:13 -04:00
uuuvn
2b81d9b334 Fix broken test (#4104) 2024-04-07 12:02:12 -04:00
uuuvn
bb7567b365 Fix metal (#4101) 2024-04-07 05:21:19 -07:00
chenyu
bdbcac67f1 assign jit test case with other tensor as input (#4098)
hmm it works
2024-04-06 14:41:14 -04:00
George Hotz
164329a8ea address kfd feedback (#4087)
* address kfd feedback

* signals cleanup

* signals cleanup

* handle 2 doorbell pages correctly

* signal reset cleanup

* signals cleanup

* more GTT

* cleanups

* minor cleanups
2024-04-05 15:24:41 -07:00
Akshit Talwar
750ecf8fef replace slice by pad/shrink in _pool (#4082) 2024-04-05 11:47:22 -04:00
George Hotz
a337922c44 more work on kfd (#4079)
* more work on kfd

* fix multitensor test on kfd

* stuff
2024-04-05 08:36:36 -07:00
chenyu
e7ff5102cf failed test in test_pattern_matcher (#4080)
something about the PTX rewrite is incorrect that it has duplicated rewritten uops
2024-04-05 02:53:50 -04:00
George Hotz
3de855ea50 don't use SVM memory in KFD (#4072)
* don't use SVM memory in KFD

* copy from fd

* cleanups

* transfer

* hacks

* ops_hsa

* tighter API
2024-04-04 17:33:21 -07:00
chenyu
c1cffed1df add LazyOp.dtype (#4073)
an inferred cached_property.
removed all cases that use get_lazyop_info just to get the dtype of an op.
prereq to remove InterpretedFlopCounter
2024-04-04 17:38:19 -04:00
Szymon Ożóg
82b7b9655f test for dtype set (#4069) 2024-04-04 11:24:33 -04:00
geohotstan
1a1dd1c1a7 add and enable tests for indexing const folding (#4068)
* enable test in test_indexing

* added tests

* rename stuff

* del a test case cuz it's loadops.copy
2024-04-04 10:46:28 -04:00
Szymon Ożóg
ba118abfec improved caching for pointer arithmetics in ptx (#3922)
* improved caching for pointer arithmetics

* Add test for pointer arithmetics caching

* Refactor test
2024-04-04 07:33:48 -07:00
George Hotz
7181ffd630 HWCopyQueue in KFD (#4042)
* HWCopyQueue in KFD

* hw compute queue

* test

* move test

* more tests

* fix wait

* fix multimap

* mes crash

* tests pass but slow

* stuff is working

* one more test
2024-04-03 20:14:24 -07:00
chenyu
e3c0ac9fbf remove old envvar "OPT" (#4060) 2024-04-03 14:55:21 -04:00
chenyu
406cb5fd90 const fold ReduceOps (#4059) 2024-04-03 14:39:28 -04:00
chenyu
fe03725b21 const fold cast unrealized_unpadded_const (#4047)
* const fold unrealized_unpadded_const

changed the underlying arg directly

* CAST_BEFORE_VIEW folds some

* fix const index in getitem
2024-04-03 12:31:24 -04:00
Szymon Ożóg
e5a9bff899 Add pattern matcher tests, move uop transforms from assembly to pattern (#4056)
matcher
2024-04-03 09:06:43 -07:00
chenyu
f61ed869f5 Use exec_alu for lazy const folding (#4039) 2024-04-02 20:52:05 -04:00
chenyu
85edc493b0 uops const fold rules to prevent tautological compare warnings (#4041)
* uops const fold rules to prevent tautological compare warnings

`bool < false` is false, `true < bool` is false, `a == a` is true, `a != a` is false

* not true for nan

* and nan does not work with llvm

* full truth table test

* revert a==a

* comments and indents
2024-04-02 16:45:58 -04:00
Patrick Tsai
0147174ad6 Embedding in one kernel (#4036)
* Embedding is in one kernel

* embedding is one kernel

* rm extra line

* newline

* bert test counts state vars?

* add a test?

* move items around

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-04-02 11:38:21 -04:00
Dan Hoffman
5311b45053 re-enable has_local check for linearizer test (#4034)
Co-authored-by: Dan Hoffman <daniel.hoffman@intel.com>
2024-04-02 00:02:03 -04:00
George Hotz
7425a0c646 CommandQueue is the future (#3950)
* start of command queue

* cq work

* runs

* cleanup

* outs set

* read is gone

* future buffer work

* command queue is better

* command queue works

* loadops

* delete unneeded

* command queue works

* upd

* fix tests

* use CommandQueue in compile

* delay sync
2024-04-01 17:35:48 -07:00