* dtype.vec shapes
* something
* Closer
* more passes
* shape is in spec
* fix reduce
* image dtype shape correct
* lil
* use reshape on image
* need BUFFER there
* remove that test
* fix ptx + x86
* fix nir
* x86 fix maybe
* x86 fixups
* x86 fix
* don't check that for NOOP
The Linux path of pci_scan_bus reads /sys/bus/pci/devices/.../class and
skips devices whose base class doesn't match. The macOS (IOKit) path
appended every IOPCIDevice unconditionally, so callers that supplied
base_class to narrow down to e.g. display devices would also get the
audio companion function of a multifunction GPU.
Concretely, an NVIDIA RTX Pro 6000 Blackwell exposes:
10de:2bb1 class 0x030000 (display)
10de:22e8 class 0x040300 (multimedia audio)
A PROBE for base_class=3 returned both. With the sorted() at the end of
pci_scan_bus, 22e8 (audio) came first, so the NV runtime picked the
audio function as device 0 and stalled on RESIZE_BAR.
This mirrors the Linux filter on line 70 using the existing read_prop
helper.
Co-authored-by: Christopher Bradford <christopher.bradford@joby.aero>
* draft
* cleanup test_encodings
* cleanup test_isel
* model flag state and support rematerialization
* woops
* add vbroadcastss instruction
* don't fuse load if used multiple times in src
* add movabs instruction and fix idiv
* fixes
* add x86 backend to tests
* float16 fix
* rm TwoAddress2nd
* add BARRIER
* test windows ci
* yup isel fixes the mask stuff too and its beautiful
* add cmoves to the spec
* support storing imms
* no TUPLE_ORDER, breaks tests
* fix remaining seg faults
* add float max
* always fuse index
* minor
* fix DEFINE_VAR/SPECIAL and enable multithreading
* linter
* more linter
* more
* more
* more
* let's try this
* perhaps
* start new scheduler
* more scheduling info
* cleaner shuffle functions
* fixup isel tests
* skip bounds check when NOOPs exist
* skip inf rewrite tests
* fix const tag hack and add x86ops to _shape
* fix
* skip a few tests
* func arg order independent from op value
* x86 goes in own linearize
* switch to PARAM
* more
* add min x86op and neg in decomps
* do mulacc in isel
* use def_reg in test_encodings
* enable emulated int64 tests
* how much does this fix
* Ops becomes OpType
* fix
* rm noqa
* rm machine scheduler stuff
* and this
* allow for extending enums and move X86Ops out of uop
* fix imports
* rm X86GroupOp from ops.py
* spacing
* tell mypy to shut up
* more linter
* add x86op test
* allow set[X86Ops] in upat
* move NOOPs to pre_isel_matcher and rm NOOP from spec
* more asserts
* also this
* cleanup encode
* simplify live range
* fix idiv
* add Ops.INS to x86
* more changes
* more changes
* more changes
* fix
* fix
* fix
* fix
* print formatted assembly
* fix 8bit idiv?
* oops
* enable float16 and unaligned vector load/store
* actually no
* move x86 tests
* no more bool cast
* fix
* linter
* linter
* move X86Ops to x86.py
* fix vpbroadcast
* cleanups
* linter
* print correct reg names
* canonical max
* move max/min and add test
* support float16 vector load/store
* rm bad rewrite
* vpsrldq can't access memory
* regalloc takes renderer
* enable vector load/store on all dtypes
* more isel tests
* rm this for now
* a lot better
* fix
* fix
* fix
* deal with flags correctly
* fix
* enable gep noop rule
* fix
* fix
* fix
* add callee saved registers
* use Ops.CONST instead of X86Ops.IMM
* fix
* enable TUPLE_ORDER
* fix
* rm x86 code in linearizer
* fix
* fix
* fix
* move isa rewrites to codegen
* fix
* fix
* skip test_linearizer.py
* skip more tests
* fix
* fix for idiv/mod changes
* fix
* don't use fmadd if it duplicates fused op
* hacky
* fix
* cleanups
* cleanups
* fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* more deviceless const prerequisites [pr]
* remove that
* arange.contiguous -> arange.clone in tests
arange will become deviceless const soon, update tests where it needs to be a buffer