* put the ranges on store instead of after
* better assert
* fix stuff
* comment out slow rules i don't understand
* simpler rule
* closer
* return false for store
* fix loop
* only a few schedule failures remain
* remove stores to self
* all tests pass locally
* remove junk
* regression test and fix
* better test, bump broken torch count
* bugfix with regression test
* new fusion is better
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
* cdna emulator work
* accvgprs
* cdna passes most tests
* ruff
* add cdna4 to tests
* cdna emu
* crash
* pass?
* work
* gen
* clean up wave_size access
* asm_gemm passes
* remove acc from dsl.py, emulator can keep its different reg file
it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can
be cleaned up later, but not functionally required for emulator.
* split asm_gemm tests to ones fast on the emulator
* don't do that
* 124 stays null on rdna
* the segfault was because of hw regs, not this
* Revert "clean up wave_size access", it's explicitly tested
This reverts commit 1202ff5787.
* nullcopyout
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* move more to mixins
* revert
* move some
* do not change
* more
* fix tests
* Revert "more"
This reverts commit d942d59fa4.
* go
* work
* more
* work
* guard
* base
* start new rdna4
* work
* plus works
* more pass
* rdna4
* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp
* stale
* rev
* rr
* rdna4 emu tests
* cleanup
* cleanup
* simp
* works
* better factorizaion
* hacks
* fix mockgpu
* guard both
* cleaner
* gate
* bug fix and a few tests
* all test_tiny
* write python emulator from RDNA3 psuedocode in pdf
* emu2
* more emu
* working
* more psueod
* progress
* cleanups
* delete junk
* delete stale files
* just emu
* work
* emu compare
* bemu
* cleanups and more failures
* revert bench emu
* fix emu cmp
* four tests fail
* bugfixes
* dsl
* ext
* refactor
* dsl
* div scale fix
* test_emu
* fix emu tests
* pcode
* test pcode
* top imports
* fix test_emu to use run_asm
* emu tests on real hardware
* more tests
* more emu tests
* more
* work
* work
* bug fix
* bugfixes
* fix fp16 gemm
* all ops tests pass in emulator
* fix llvm tests
* fix a few more tests
* fix mockgpu timeout
* having fun with python asm dsl
* rdna3
* meh
* all in rdna3
* work
* more work
* work
* integration
* tests
* simpler
* simpler
* asm
* better
* simpler
* progress
* emu
* simpler
* emu
* tests
* types
* vopd
* cleaups
* work
* memory ranges
* add tracing
* refactors
* run_asm exit
* more readable
* compare to remu
* test gemm
* bug + stale
* more tests
* refactor
* tests fix
* more ins
* more instructions
* refactor
* faster
* match case
* match case
* simpler
* work
* tests
* run_asm
* work
* bug fixes
* more emu
* alu/emu
* refactor
* no pipeline emu yet
* alu direct
* fix
* bugfixes + new test
* fix exceptions in emulators
* update gen.py
* pylint
* no pdf
* improve bench_emu
* speedups
* cleanups
* more tests