Commit Graph

1093 Commits

Author SHA1 Message Date
qazal
773b036c61 share can_pad in ops [pr] (#7550) 2024-11-05 17:58:50 +08:00
George Hotz
075bdb81b3 remove Ops.REDUCE (#7541) 2024-11-05 09:41:28 +08:00
George Hotz
ab14fc1f5b GroupOp.Irreducible [pr] (#7540) 2024-11-05 09:35:34 +08:00
George Hotz
d30537494a remove do_reduce [pr] (#7536) 2024-11-05 01:46:11 +08:00
George Hotz
cb57774b64 pre index load and store [pr] (#7535)
* pre index load and store [pr]

* check ptrtype
2024-11-05 01:21:14 +08:00
George Hotz
76cc59940d only match with op, not arg [pr] (#7534) 2024-11-05 00:43:17 +08:00
George Hotz
99bd4372a5 Ops.ALU is no more, the arg is just an op (#7525)
* op arg alu [pr]

* more

* more passing

* fix more tests

* more tests passing

* fix single failing test

* so much cleaner

* noop to not have process replay trigger

* fix ptx
2024-11-05 00:22:22 +08:00
George Hotz
e2204378d9 more GroupOp [pr] (#7524) 2024-11-04 18:40:06 +08:00
George Hotz
c1585bcc9e flatten ops (#7523)
* flatten ops

* fix mypy
2024-11-04 18:07:23 +08:00
George Hotz
bac251d2c1 idx_load_store in lowerer [pr] (#7477)
* idx_load_store in lowerer [pr]

* fix tests (#7513)

Co-authored-by: John Doe <null@mail.com>

* work

---------

Co-authored-by: Carl Basho <76494676+oldpondplop@users.noreply.github.com>
Co-authored-by: John Doe <null@mail.com>
2024-11-04 10:18:40 +08:00
chenyu
7758f7211b Revert "s/UPat/Pat (#7506)" [pr] (#7517)
* Revert "s/UPat/Pat (#7506)"

This reverts commit 400011a8c1.

* fix
2024-11-03 16:33:02 -05:00
chenyu
400011a8c1 s/UPat/Pat (#7506) 2024-11-03 08:26:19 -05:00
qazal
37f8578953 s/BUFFER_UOPS/BUFOPS (#7501) 2024-11-03 10:17:33 +02:00
George Hotz
c8bf09b7d4 s/UOps/Ops (#7500)
* s/UOps/Ops [pr]

* fix
2024-11-03 11:26:10 +08:00
George Hotz
06f476b371 late transcendental (#7498) 2024-11-03 10:53:58 +08:00
chenyu
baaec39ffc update get_transcendental_patterns [pr] (#7489)
i think ths is better than `(p[0], cast(Callable, p[1]))`
2024-11-02 14:25:31 -04:00
chenyu
55bd136746 clean up reshape_and_permute (#7488)
probably will rewrite it later as reshape and permute function on Kernel, but for now it's shorter with better types
2024-11-02 13:44:14 -04:00
chenyu
74c7b9d84a clean up Kernel.name (#7486)
* clean up Kernel.name

* narrow that str
2024-11-02 12:48:37 -04:00
ignaciosica
18bd98c203 Add shl and shr to llvmir (#7449)
* add shl and shr to llvmir

* hotfix: enforce type alignment for shr and shl in all backends

* hotfix: change shl and shr spec

* hotfix: typo

* hotfix: refactor shl and shr rules and add casting to ptx shl

* hotfix: bug

* hotfix: ptx shl and shr require buint32

* hotfix: cleanups
2024-11-01 23:49:34 +08:00
George Hotz
fe78ed8cb7 improve match speed [pr] (#7465)
* improve match speed [pr]

* no sym in expand

* remove useless rule, sym back

* don't track that
2024-11-01 17:33:53 +08:00
George Hotz
a7ba3d2d91 move reduce to lowerer [pr] (#7462)
* move reduce to lowerer [pr]

* simpler
2024-11-01 16:39:20 +08:00
George Hotz
2cfca230b5 reduce collapse as a rule (#7464)
* reduce collapse as a rule

* better [pr]

* cleaner
2024-11-01 16:25:44 +08:00
George Hotz
4f6cf1f8cc expand DEFINE_ACC [pr] (#7461) 2024-11-01 15:20:43 +08:00
chenyu
a21434504b update payne_hanek_reduction [pr] (#7455) 2024-10-31 18:41:22 -04:00
chenyu
5777fca904 clean up cody_waite_reduction magic numbers (#7452) 2024-10-31 14:45:04 -04:00
chenyu
5648b9788e more xlog2 cleanups (#7451)
following the notations in the paper closer
2024-10-31 13:52:31 -04:00
chenyu
4065c3dec8 remove special 0 case in frexp (#7450)
we can safely assume input is non-zero, also removed unneeded bitcast
2024-10-31 13:02:33 -04:00
chenyu
53db3478fe cast to float32 for float16 xlog2 (#7447)
formula has 2X error with denormal floats
2024-10-31 10:36:29 -04:00
chenyu
5085b2fde7 cleanup xlog2 and remove unneeded functions (#7446)
denormal_map still looks wrong but a lot cleaner
2024-10-31 09:45:16 -04:00
chenyu
02636bc05e simpler switch over in xsin (#7426) 2024-10-31 08:56:01 -04:00
George Hotz
a43b7a4b7c less rewrite stages in matcher (#7445)
* less rewrite stages in matcher

* better name
2024-10-31 19:45:21 +08:00
George Hotz
5dd1ffd5d0 don't const rewrite in cstyle (#7442)
* don't const rewrite in cstyle

* Update cstyle.py

* simple_symbolic

* fix bfloat16 const on AMD
2024-10-31 19:16:49 +08:00
George Hotz
50ddd11350 lil cleanup matchers [pr] (#7437)
* move delete_redundant_gates [pr]

* simpler uops test

* addr in delete_redundant_gates

* lines

* correct early delete gates

* shorter find_gate
2024-10-31 17:52:22 +08:00
George Hotz
2e3048fc57 Revert "improve full_graph_rewrite matchers for speed (#7431)" (#7434)
This reverts commit 996152d2de.
2024-10-31 16:16:47 +08:00
George Hotz
996152d2de improve full_graph_rewrite matchers for speed (#7431)
* remove finalize [pr]

* early transcendental

* fix tests

* load store indexing runs with devectorize

* move delete_redundant_gates

* ptx has to wait for the mask to move
2024-10-31 16:13:11 +08:00
George Hotz
17c9a9fde4 pm_render [pr] (#7430)
* pm_render [pr]

* test fixes

* use gep, not src

* ptx only symbolic, not sym

* move cast rules
2024-10-31 15:04:50 +08:00
George Hotz
8fff8fc3e7 replace REDUCE and clean up arange (#7429)
* break apart arange [pr]

* fix missing

* cleanups to add/mul

* UOps.VECTORIZE

* don't vectorize const
2024-10-31 14:02:20 +08:00
George Hotz
fe2bc4c613 clean up arange/indexing matchers [pr] (#7427)
* clean up arange/indexing matchers [pr]

* syntax for assign
2024-10-31 12:12:44 +08:00
George Hotz
e446e95974 enforce ctx is called ctx [pr] (#7424)
* enforce ctx is called ctx [pr]

* fix bug and use has_ctx

* inspect signature

* assert

* no slow asserts

* now we can support contextual reduce
2024-10-31 11:39:19 +08:00
chenyu
9b08bb4c3e fold the +x term in sine inside sin_poly (#7425) 2024-10-30 23:13:08 -04:00
chenyu
0739895b4d tiny clena up pow2if and payne_hanek_reduction (#7423) 2024-10-30 22:22:48 -04:00
chenyu
118dd7721f clean up transcendental.rintk [pr] (#7422)
added unit tests and updated the comment. it's rounding away from 0 for negatives
2024-10-30 20:37:28 -04:00
George Hotz
7039fba406 move indexing first (#7409)
* move indexing first [pr]

* no create gate

* fix create_gate

* fix load/store folding

* fix index folding

* remove comment, no process replay
2024-10-31 00:50:35 +08:00
George Hotz
133fe81cc5 Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" (#7407)
* Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)"

This reverts commit ea5654a9bc.

* test padded in emulation too

* bring back early folding
2024-10-30 23:25:45 +08:00
chenyu
ea5654a9bc Revert "move up migrate + new gated fold (#7403)" (#7406)
This reverts commit adccfade7f.
2024-10-30 23:02:18 +08:00
George Hotz
adccfade7f move up migrate + new gated fold (#7403)
* move up migrate + new gated fold [pr]

* vcount for const ptr

* move those rules there

* fix openpilot
2024-10-30 22:14:01 +08:00
chenyu
16e60d25b9 move polyN to helper [pr] (#7405)
also move `eval_uop` to `test.helpers`
2024-10-30 10:09:57 -04:00
George Hotz
f3bd5cbf78 simplest migration of indexing [pr] (#7402)
* simplest migration of indexing [pr]

* fix locals/barrier
2024-10-30 20:58:18 +08:00
George Hotz
4e2895f8d2 safe changes from new dtype branch [pr] (#7397)
* safe changes from new dtype branch [pr]

* only image test on GPU
2024-10-30 17:18:48 +08:00
chenyu
f389e1a8a0 test more special values for sin/cos/tan [pr] (#7386) 2024-10-29 21:13:37 -04:00