Commit Graph

1350 Commits

Author SHA1 Message Date
Christopher Milan
83971860d8 ci: simplify webgpu install (#16557) 2026-06-10 22:57:19 -04:00
Christopher Milan
6e1b61f16f cleanup some amd deps (#16563)
don't load hsa runtime, remove ib autogen
2026-06-10 19:01:56 -04:00
George Hotz
fd76ac992e cstyle renderer is new style [pr] (#16484)
* cstyle new style

* switch cstyle renderer to new style

* fix hip

* fixes

* fix webgpu

* correct webgpu is_packed

* fix dsp

* fixes

* fix Ops.RANGE must be CONST

* old style render access

* this is correct

* fix cstyle to good

* dl/dr

* as array

* fix spec

* remove define_local/define_reg

* buffer in shrink

* fix test_tiny

* all tests fix

* param args aren't realized

* wgsl fix

* work

* new gate

* fix opencl qcom

* process replay

* sort order

* fix render index
2026-06-09 18:36:01 -07:00
Christopher Milan
97d483350c ci: download prebuilt ocelot (#16554) 2026-06-09 19:51:33 -04:00
Christopher Milan
857b1f5399 ci: more parallelism, less duplication (#16509) 2026-06-05 21:26:19 -04:00
Christopher Milan
9dac781e45 ci: use uv (#16492) 2026-06-03 21:38:50 -04:00
Christopher Milan
f43cba5765 ci: native python where possible (#16473)
linters stays at 3.11
2026-06-02 22:40:12 -04:00
George Hotz
ffadd7a315 remove intel and amx support (#16482) 2026-06-02 18:53:05 -07:00
Christopher Milan
9897658895 ci: fix ocelot compilation on macos (#16471) 2026-06-02 12:43:31 -04:00
Christopher Milan
c6cad1ad67 ci: standardize runs-on (#16466)
* ci: use macos 26

* ugh github

* stick with github for arm
2026-06-01 21:39:58 -04:00
Christopher Milan
b0ecbb34d9 ci: cleanup python backend tests (#16465) 2026-06-01 20:08:05 -04:00
Christopher Milan
2d0f132a3b ci: cleanup more duplicate tests (#16462) 2026-06-01 18:56:29 -04:00
Christopher Milan
c377d01491 ci: run dsp on tinygrad[testing] (#16442) 2026-05-29 21:16:56 -04:00
Christopher Milan
d943493b79 ci: remove duplicate op compile test (#16441) 2026-05-29 19:20:31 -04:00
Christopher Milan
ef50a49693 ci: macos dev matrix (#16436) 2026-05-29 17:40:32 -04:00
Christopher Milan
6e0d5262dc ci: autocancel outdated pr jobs (#16424) 2026-05-28 23:14:35 -04:00
Christopher Milan
69aa2054f6 rename clangjit to clang (#16423) 2026-05-28 22:41:58 -04:00
Christopher Milan
a909acb882 move llvmspeed to benchmarks (#16422) 2026-05-28 22:26:22 -04:00
Christopher Milan
7d38edffdb ci: dev matrix (#16420)
windows just runs test_tiny
2026-05-28 22:04:04 -04:00
George Hotz
c87f3433d1 use namespace runners (#16387)
Co-authored-by: Christopher Milan <chrismilan@ucla.edu>
2026-05-28 18:05:46 -04:00
Christopher Milan
c8af163d2b disable process replay by default (#16419)
enable process replay with [pr] and assert with [PR]
process replay no longer captures on master
2026-05-28 17:36:28 -04:00
Christopher Milan
aacc8addf4 ci: use ubuntu 24.04 (#16393) 2026-05-26 23:22:01 -04:00
George Hotz
322693dcd3 hotfix: bump Mac pytest timeout to 4 minutes (try 2) 2026-05-25 18:23:21 -07:00
George Hotz
942cb42b97 Revert "hotfix: bump Mac pytest timeout to 4 minutes"
This reverts commit 695a0069ed.
2026-05-25 17:25:11 -07:00
George Hotz
695a0069ed hotfix: bump Mac pytest timeout to 4 minutes 2026-05-25 17:20:19 -07:00
Christopher Milan
35461d4d8f ci: cleanup some deps [pr] (#16340) 2026-05-22 19:16:08 -04:00
Christopher Milan
518e60534e only load tinymesa_cpu when LVP is explicitly requested (#16320) 2026-05-21 19:03:13 -04:00
George Hotz
58d58c1659 remove DEVECTORIZE (#16290)
* remove DEVECTORIZE

* fully remove DEVECTORIZE
2026-05-20 13:25:49 -07:00
chenyu
7af7b6703a relax policy ASSERT_MIN_STEP_TIME to 3.2 (#16273) 2026-05-19 22:29:09 -04:00
George Hotz
da7414d6dc fix RUN_PICKLE and test it (#16272)
* add test for openpilot RUN_PICKLE

* fix RUN_PICKLE and test it
2026-05-19 17:00:25 -07:00
ttomsa
aa1e59ab97 X86 with Ops.INS (#14873)
* draft

* cleanup test_encodings

* cleanup test_isel

* model flag state and support rematerialization

* woops

* add vbroadcastss instruction

* don't fuse load if used multiple times in src

* add movabs instruction and fix idiv

* fixes

* add x86 backend to tests

* float16 fix

* rm TwoAddress2nd

* add BARRIER

* test windows ci

* yup isel fixes the mask stuff too and its beautiful

* add cmoves to the spec

* support storing imms

* no TUPLE_ORDER, breaks tests

* fix remaining seg faults

* add float max

* always fuse index

* minor

* fix DEFINE_VAR/SPECIAL and enable multithreading

* linter

* more linter

* more

* more

* more

* let's try this

* perhaps

* start new scheduler

* more scheduling info

* cleaner shuffle functions

* fixup isel tests

* skip bounds check when NOOPs exist

* skip inf rewrite tests

* fix const tag hack and add x86ops to _shape

* fix

* skip a few tests

* func arg order independent from op value

* x86 goes in own linearize

* switch to PARAM

* more

* add min x86op and neg in decomps

* do mulacc in isel

* use def_reg in test_encodings

* enable emulated int64 tests

* how much does this fix

* Ops becomes OpType

* fix

* rm noqa

* rm machine scheduler stuff

* and this

* allow for extending enums and move X86Ops out of uop

* fix imports

* rm X86GroupOp from ops.py

* spacing

* tell mypy to shut up

* more linter

* add x86op test

* allow set[X86Ops] in upat

* move NOOPs to pre_isel_matcher and rm NOOP from spec

* more asserts

* also this

* cleanup encode

* simplify live range

* fix idiv

* add Ops.INS to x86

* more changes

* more changes

* more changes

* fix

* fix

* fix

* fix

* print formatted assembly

* fix 8bit idiv?

* oops

* enable float16  and unaligned vector load/store

* actually no

* move x86 tests

* no more bool cast

* fix

* linter

* linter

* move X86Ops to x86.py

* fix vpbroadcast

* cleanups

* linter

* print correct reg names

* canonical max

* move max/min and add test

* support float16 vector load/store

* rm bad rewrite

* vpsrldq can't access memory

* regalloc takes renderer

* enable vector load/store on all dtypes

* more isel tests

* rm this for now

* a lot better

* fix

* fix

* fix

* deal with flags correctly

* fix

* enable gep noop rule

* fix

* fix

* fix

* add callee saved registers

* use Ops.CONST instead of X86Ops.IMM

* fix

* enable TUPLE_ORDER

* fix

* rm x86 code in linearizer

* fix

* fix

* fix

* move isa rewrites to codegen

* fix

* fix

* skip test_linearizer.py

* skip more tests

* fix

* fix for idiv/mod changes

* fix

* don't use fmadd if it duplicates fused op

* hacky

* fix

* cleanups

* cleanups

* fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-19 12:42:54 -07:00
George Hotz
b2e8102209 25000 lines for x86 backend 2026-05-19 11:27:41 -07:00
Christopher Milan
7515824a6d ci: actually use clang-20, enable bfloat16 (#16249) 2026-05-18 19:06:43 -04:00
Christopher Milan
891a1ae7c2 onnx: remove dtype_fallback (#15717) 2026-05-14 22:06:57 -04:00
Christopher Milan
9a365d9978 ci: fix null image tests (#16188) 2026-05-13 18:00:05 -04:00
Christopher Milan
f1fdd2ccec ci: add IMAGE=1 compile-only tests (#16182)
* ci: add IMAGE=1 compile-only tests

* fix
2026-05-12 23:40:32 -04:00
Christopher Milan
7d0c5ab689 ci: ocelot needs nvcc on linux (#16178)
* ci: ocelot needs nvcc on linux

* cudart
2026-05-12 23:13:48 -04:00
Christopher Milan
3844a31f87 ci: untangle cuda/ocelot, less apt (#16171)
* ci: untangle cuda/ocelot, less apt

* ldconfig
2026-05-12 18:14:03 -04:00
Christopher Milan
316607f004 dsp: don't use docker in ci (#16167)
* dsp: don't use docker in ci

* add setup script for macos docker
2026-05-12 17:11:03 -04:00
nimlgen
a708542308 fix ci spec (#16156) 2026-05-12 17:57:11 +03:00
Christopher Milan
7ba55ad3ba nv: autogen regs (#16139)
* nv: autogen regs

* flcn cot

* ci

* gen
2026-05-11 18:52:24 -04:00
Christopher Milan
a7512e0d12 PYTHON: images have no alignment constraints (by default) (#16115) 2026-05-08 20:35:03 -04:00
Christopher Milan
105b037c3c cl: image alignment in arch (#16106) 2026-05-08 19:33:33 -04:00
George Hotz
1d1b726cf6 hotfix: disable flaky framework pytest 2026-05-07 17:05:06 -07:00
qazal
f9083cf901 use subactions for benchmark.yml process replay [pr] (#13396) 2026-05-08 03:46:25 +09:00
Christopher Milan
bd06ea9f97 am: simplify import_module (#16046) 2026-05-05 19:25:53 -04:00
chenyu
34fe37d64e use FLOORDIV and FLOORMOD (#16048)
* use FLOORDIV and FLOORMOD

also removed CORRECT_DIVMOD_FOLDING

* fix

* Revert "fix"

This reverts commit 86af33b88ef31943c61e67189b072eca4896409a.

* fix

* fix
2026-05-05 18:32:54 -04:00
Christopher Milan
76ff378007 autogen: fewer apt dependencies (#16049) 2026-05-05 17:22:41 -04:00
chenyu
9c37a0c75d Ops.FLOORDIV and Ops.FLOORMOD (#16038)
* Ops.FLOORDIV and Ops.FLOORMOD

lowered into IDIV and MOD in get_late_rewrite_patterns

* still need this

* exclude

* like that?
2026-05-05 11:42:14 -04:00
Christopher Milan
1c8cb0769a am: autogen asic_regs (#16004) 2026-05-04 22:52:07 -04:00