qazal
773b036c61
share can_pad in ops [pr] ( #7550 )
2024-11-05 17:58:50 +08:00
George Hotz
075bdb81b3
remove Ops.REDUCE ( #7541 )
2024-11-05 09:41:28 +08:00
George Hotz
ab14fc1f5b
GroupOp.Irreducible [pr] ( #7540 )
2024-11-05 09:35:34 +08:00
George Hotz
d30537494a
remove do_reduce [pr] ( #7536 )
2024-11-05 01:46:11 +08:00
George Hotz
cb57774b64
pre index load and store [pr] ( #7535 )
...
* pre index load and store [pr]
* check ptrtype
2024-11-05 01:21:14 +08:00
George Hotz
76cc59940d
only match with op, not arg [pr] ( #7534 )
2024-11-05 00:43:17 +08:00
George Hotz
99bd4372a5
Ops.ALU is no more, the arg is just an op ( #7525 )
...
* op arg alu [pr]
* more
* more passing
* fix more tests
* more tests passing
* fix single failing test
* so much cleaner
* noop to not have process replay trigger
* fix ptx
2024-11-05 00:22:22 +08:00
George Hotz
e2204378d9
more GroupOp [pr] ( #7524 )
2024-11-04 18:40:06 +08:00
George Hotz
c1585bcc9e
flatten ops ( #7523 )
...
* flatten ops
* fix mypy
2024-11-04 18:07:23 +08:00
George Hotz
bac251d2c1
idx_load_store in lowerer [pr] ( #7477 )
...
* idx_load_store in lowerer [pr]
* fix tests (#7513 )
Co-authored-by: John Doe <null@mail.com >
* work
---------
Co-authored-by: Carl Basho <76494676+oldpondplop@users.noreply.github.com >
Co-authored-by: John Doe <null@mail.com >
2024-11-04 10:18:40 +08:00
chenyu
7758f7211b
Revert "s/UPat/Pat ( #7506 )" [pr] ( #7517 )
...
* Revert "s/UPat/Pat (#7506 )"
This reverts commit 400011a8c1 .
* fix
2024-11-03 16:33:02 -05:00
chenyu
400011a8c1
s/UPat/Pat ( #7506 )
2024-11-03 08:26:19 -05:00
qazal
37f8578953
s/BUFFER_UOPS/BUFOPS ( #7501 )
2024-11-03 10:17:33 +02:00
George Hotz
c8bf09b7d4
s/UOps/Ops ( #7500 )
...
* s/UOps/Ops [pr]
* fix
2024-11-03 11:26:10 +08:00
George Hotz
06f476b371
late transcendental ( #7498 )
2024-11-03 10:53:58 +08:00
chenyu
baaec39ffc
update get_transcendental_patterns [pr] ( #7489 )
...
i think ths is better than `(p[0], cast(Callable, p[1]))`
2024-11-02 14:25:31 -04:00
chenyu
55bd136746
clean up reshape_and_permute ( #7488 )
...
probably will rewrite it later as reshape and permute function on Kernel, but for now it's shorter with better types
2024-11-02 13:44:14 -04:00
chenyu
74c7b9d84a
clean up Kernel.name ( #7486 )
...
* clean up Kernel.name
* narrow that str
2024-11-02 12:48:37 -04:00
ignaciosica
18bd98c203
Add shl and shr to llvmir ( #7449 )
...
* add shl and shr to llvmir
* hotfix: enforce type alignment for shr and shl in all backends
* hotfix: change shl and shr spec
* hotfix: typo
* hotfix: refactor shl and shr rules and add casting to ptx shl
* hotfix: bug
* hotfix: ptx shl and shr require buint32
* hotfix: cleanups
2024-11-01 23:49:34 +08:00
George Hotz
fe78ed8cb7
improve match speed [pr] ( #7465 )
...
* improve match speed [pr]
* no sym in expand
* remove useless rule, sym back
* don't track that
2024-11-01 17:33:53 +08:00
George Hotz
a7ba3d2d91
move reduce to lowerer [pr] ( #7462 )
...
* move reduce to lowerer [pr]
* simpler
2024-11-01 16:39:20 +08:00
George Hotz
2cfca230b5
reduce collapse as a rule ( #7464 )
...
* reduce collapse as a rule
* better [pr]
* cleaner
2024-11-01 16:25:44 +08:00
George Hotz
4f6cf1f8cc
expand DEFINE_ACC [pr] ( #7461 )
2024-11-01 15:20:43 +08:00
chenyu
a21434504b
update payne_hanek_reduction [pr] ( #7455 )
2024-10-31 18:41:22 -04:00
chenyu
5777fca904
clean up cody_waite_reduction magic numbers ( #7452 )
2024-10-31 14:45:04 -04:00
chenyu
5648b9788e
more xlog2 cleanups ( #7451 )
...
following the notations in the paper closer
2024-10-31 13:52:31 -04:00
chenyu
4065c3dec8
remove special 0 case in frexp ( #7450 )
...
we can safely assume input is non-zero, also removed unneeded bitcast
2024-10-31 13:02:33 -04:00
chenyu
53db3478fe
cast to float32 for float16 xlog2 ( #7447 )
...
formula has 2X error with denormal floats
2024-10-31 10:36:29 -04:00
chenyu
5085b2fde7
cleanup xlog2 and remove unneeded functions ( #7446 )
...
denormal_map still looks wrong but a lot cleaner
2024-10-31 09:45:16 -04:00
chenyu
02636bc05e
simpler switch over in xsin ( #7426 )
2024-10-31 08:56:01 -04:00
George Hotz
a43b7a4b7c
less rewrite stages in matcher ( #7445 )
...
* less rewrite stages in matcher
* better name
2024-10-31 19:45:21 +08:00
George Hotz
5dd1ffd5d0
don't const rewrite in cstyle ( #7442 )
...
* don't const rewrite in cstyle
* Update cstyle.py
* simple_symbolic
* fix bfloat16 const on AMD
2024-10-31 19:16:49 +08:00
George Hotz
50ddd11350
lil cleanup matchers [pr] ( #7437 )
...
* move delete_redundant_gates [pr]
* simpler uops test
* addr in delete_redundant_gates
* lines
* correct early delete gates
* shorter find_gate
2024-10-31 17:52:22 +08:00
George Hotz
2e3048fc57
Revert "improve full_graph_rewrite matchers for speed ( #7431 )" ( #7434 )
...
This reverts commit 996152d2de .
2024-10-31 16:16:47 +08:00
George Hotz
996152d2de
improve full_graph_rewrite matchers for speed ( #7431 )
...
* remove finalize [pr]
* early transcendental
* fix tests
* load store indexing runs with devectorize
* move delete_redundant_gates
* ptx has to wait for the mask to move
2024-10-31 16:13:11 +08:00
George Hotz
17c9a9fde4
pm_render [pr] ( #7430 )
...
* pm_render [pr]
* test fixes
* use gep, not src
* ptx only symbolic, not sym
* move cast rules
2024-10-31 15:04:50 +08:00
George Hotz
8fff8fc3e7
replace REDUCE and clean up arange ( #7429 )
...
* break apart arange [pr]
* fix missing
* cleanups to add/mul
* UOps.VECTORIZE
* don't vectorize const
2024-10-31 14:02:20 +08:00
George Hotz
fe2bc4c613
clean up arange/indexing matchers [pr] ( #7427 )
...
* clean up arange/indexing matchers [pr]
* syntax for assign
2024-10-31 12:12:44 +08:00
George Hotz
e446e95974
enforce ctx is called ctx [pr] ( #7424 )
...
* enforce ctx is called ctx [pr]
* fix bug and use has_ctx
* inspect signature
* assert
* no slow asserts
* now we can support contextual reduce
2024-10-31 11:39:19 +08:00
chenyu
9b08bb4c3e
fold the +x term in sine inside sin_poly ( #7425 )
2024-10-30 23:13:08 -04:00
chenyu
0739895b4d
tiny clena up pow2if and payne_hanek_reduction ( #7423 )
2024-10-30 22:22:48 -04:00
chenyu
118dd7721f
clean up transcendental.rintk [pr] ( #7422 )
...
added unit tests and updated the comment. it's rounding away from 0 for negatives
2024-10-30 20:37:28 -04:00
George Hotz
7039fba406
move indexing first ( #7409 )
...
* move indexing first [pr]
* no create gate
* fix create_gate
* fix load/store folding
* fix index folding
* remove comment, no process replay
2024-10-31 00:50:35 +08:00
George Hotz
133fe81cc5
Revert "Revert "move up migrate + new gated fold ( #7403 )" ( #7406 )" ( #7407 )
...
* Revert "Revert "move up migrate + new gated fold (#7403 )" (#7406 )"
This reverts commit ea5654a9bc .
* test padded in emulation too
* bring back early folding
2024-10-30 23:25:45 +08:00
chenyu
ea5654a9bc
Revert "move up migrate + new gated fold ( #7403 )" ( #7406 )
...
This reverts commit adccfade7f .
2024-10-30 23:02:18 +08:00
George Hotz
adccfade7f
move up migrate + new gated fold ( #7403 )
...
* move up migrate + new gated fold [pr]
* vcount for const ptr
* move those rules there
* fix openpilot
2024-10-30 22:14:01 +08:00
chenyu
16e60d25b9
move polyN to helper [pr] ( #7405 )
...
also move `eval_uop` to `test.helpers`
2024-10-30 10:09:57 -04:00
George Hotz
f3bd5cbf78
simplest migration of indexing [pr] ( #7402 )
...
* simplest migration of indexing [pr]
* fix locals/barrier
2024-10-30 20:58:18 +08:00
George Hotz
4e2895f8d2
safe changes from new dtype branch [pr] ( #7397 )
...
* safe changes from new dtype branch [pr]
* only image test on GPU
2024-10-30 17:18:48 +08:00
chenyu
f389e1a8a0
test more special values for sin/cos/tan [pr] ( #7386 )
2024-10-29 21:13:37 -04:00