Commit Graph

  • 018a9e2d3c remove match_dtype arg in Tensor._broadcasted (#15440) chenyu 2026-03-23 22:10:39 -04:00
  • cd0152efec Merge branch 'master' into new_x86_backend ttomsa 2026-03-23 22:50:56 +00:00
  • a590eded87 sqtt: rdna4 decoder work (#15434) qazal 2026-03-23 20:49:32 +02:00
  • 109472c37e sqtt: new s_barrier pickles, handle rdna4 barriers in emulator (#15437) qazal 2026-03-23 20:25:28 +02:00
  • 683bb01ead 125 George Hotz 2026-03-23 23:19:15 +08:00
  • ccca6b8ecc test qazal 2026-03-23 13:05:26 +00:00
  • e1039af42f more diff qazal 2026-03-23 12:42:31 +00:00
  • 793ad3e150 Merge remote-tracking branch 'upstream/master' into rdna4 qazal 2026-03-23 12:36:06 +00:00
  • 003bd9534c diff cleanup qazal 2026-03-23 12:32:33 +00:00
  • fa4cdb422e memplan on linears (#15422) nimlgen 2026-03-23 19:50:16 +08:00
  • 2da008ae3b jit: rm replan (#15433) nimlgen 2026-03-23 19:31:51 +08:00
  • e0d151560a sqtt: rdna4 decoder work qazal 2026-03-23 09:53:56 +00:00
  • c4c53418f8 sqtt: comment out flaky rocprof timestamp assert (#15432) qazal 2026-03-23 12:24:04 +02:00
  • 66a86f88a0 simpler Tensor._broadcasted inferred dtype (#15430) chenyu 2026-03-23 05:20:11 -04:00
  • c89576921d Updated the APIs of mnist_gan (#15429) Pham Nguyen Hung 2026-03-23 09:04:00 +00:00
  • c62dea6881 ai slop flash attention (it works) (#15401) George Hotz 2026-03-23 16:15:10 +08:00
  • 1568a5ed07 viz: show dispatch to exec delay in sidebar (#15428) qazal 2026-03-23 09:59:59 +02:00
  • ddaeebb500 nir: add shift support (#15426) Christopher Milan 2026-03-23 00:37:44 -07:00
  • c74fa9bbe1 fix jitbeam not triggered (#15424) nimlgen 2026-03-23 15:34:59 +08:00
  • fd3559103b viz/cli: better error message for empty itrace (#15425) qazal 2026-03-23 08:50:20 +02:00
  • 395aacd77d jit: prune on linear (#15423) nimlgen 2026-03-23 14:10:34 +08:00
  • 248cd9b39f make Tensor init the only caller of Tensor.from_uop (#15421) chenyu 2026-03-23 00:29:08 -04:00
  • 67dcc79fdd push Tensor(symbolic) logic to Tensor.from_uop (#15420) chenyu 2026-03-22 23:49:35 -04:00
  • 2087df814f remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416) gg 2026-03-22 12:49:26 -04:00
  • c7b18e6108 viz: sqtt printer in viz/cli.py (#15411) qazal 2026-03-22 17:17:05 +02:00
  • bcc08307da removed unused named arg in rules [pr] (#15414) chenyu 2026-03-22 09:25:46 -04:00
  • 2363bceb47 viz: no context enters in cli, update llama profile (#15404) qazal 2026-03-21 22:47:02 +02:00
  • 35fc12b839 Merge remote-tracking branch 'upstream/master' into new_x86_backend ttomsa 2026-03-21 19:16:59 +00:00
  • a9ceaf3c5f sqtt: link dispatch to exec (#15396) qazal 2026-03-21 16:48:58 +02:00
  • 9656d97d97 jit: captures linears, not execitems (#15399) nimlgen 2026-03-21 16:32:12 +08:00
  • c13d9d29ff add SHAPED_WMMA (#15400) George Hotz 2026-03-21 16:16:03 +08:00
  • 41a9b09683 minimal vec in amd_copy_matmul (#15398) George Hotz 2026-03-21 14:57:21 +08:00
  • 30b3054fd5 whitespace cleanups in viz and sqtt.py (#15395) qazal 2026-03-20 21:46:19 +02:00
  • 71ccc69c52 FP8=1 llama works again, hipcc can run on macos (#15394) qazal 2026-03-20 16:43:15 +02:00
  • 9470d5193a deterministic decomp apply order (#15393) Christopher Milan 2026-03-20 05:10:45 -07:00
  • 376585b003 use should_emulate for target dtype in decomp (#15392) Christopher Milan 2026-03-20 04:44:57 -07:00
  • a12d3951de fix test_export_model imports (#15389) Christopher Milan 2026-03-20 04:27:01 -07:00
  • 1a2a203f48 add wmma support to amd_copy_matmul (#15384) George Hotz 2026-03-20 19:02:19 +08:00
  • 1560b534a5 remove IMAGE=2 (#15312) Christopher Milan 2026-03-20 03:26:52 -07:00
  • 30d609432f ci: only xcode-select for gpuocelot on macos (#15387) Christopher Milan 2026-03-20 02:58:16 -07:00
  • d1b4e37dfa remove InvalidType branch in Tensor.__init__ (#15386) chenyu 2026-03-20 05:32:33 -04:00
  • c491345766 pass device into Tensor._frompy (#15385) chenyu 2026-03-20 05:09:01 -04:00
  • 3b75d8a7a2 fix double after bug in rangeify (#15381) George Hotz 2026-03-20 14:53:46 +08:00
  • 0c89340a1e automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366) Christopher Milan 2026-03-19 23:31:44 -07:00
  • 78ad089817 make precompile the default for llm (#15376) George Hotz 2026-03-20 14:08:55 +08:00
  • 459ef41ea0 don't exclude weakint in is_dtype_supported [pr] (#15378) chenyu 2026-03-20 02:08:29 -04:00
  • cf6a429aaa mypy emulator pre-commit passing (#15379) qazal 2026-03-20 07:44:09 +02:00
  • 87c4ec1724 llama: use flat llama (#15353) wozeparrot 2026-03-20 13:12:38 +08:00
  • da1700e16b dtypes.index -> dtypes.weakint (#15377) chenyu 2026-03-20 01:08:46 -04:00
  • 3b04e3ea28 no gmmu mappings with GMMU=0 (#15369) nimlgen 2026-03-20 12:18:34 +08:00
  • c1183b8872 remove dead code in pyrender (#15115) ridoy majumdar 2026-03-19 23:59:56 -04:00
  • bf33c5f796 remove gradient materialize_grads (#15367) chenyu 2026-03-19 23:36:03 -04:00
  • 45baf3ff3f pin ci xcode version (#15375) chenyu 2026-03-19 23:13:16 -04:00
  • 4091d37e8e flat llama step work (#15355) George Hotz 2026-03-20 09:06:12 +08:00
  • 292e1745b2 Merge remote-tracking branch 'upstream/master' into new_x86_backend ttomsa 2026-03-19 20:54:58 +00:00
  • e81878abd9 enable gep noop rule ttomsa 2026-03-19 20:54:00 +00:00
  • 176ad47d7d cdna4 emulator testing ASM_GEMM in CI (#15373) qazal 2026-03-19 22:51:30 +02:00
  • 16daffc042 remote connection timeout (#15370) nimlgen 2026-03-19 19:44:16 +08:00
  • 68d7a6b7be PYTHONREMU: fix vop3p literals (#15372) Christopher Milan 2026-03-19 04:05:01 -07:00
  • 70dad9d642 add PING to RemoteCmd (#15371) George Hotz 2026-03-19 18:57:40 +08:00
  • 1c978aeedb amd: fix aql remote (#15368) nimlgen 2026-03-19 18:11:03 +08:00
  • 337c684047 viz: cycle time relative to kernel start in sidebar (#15352) qazal 2026-03-19 11:41:29 +02:00
  • d81b03cff4 pad_to to mixin [pr] (#15365) chenyu 2026-03-19 05:02:01 -04:00
  • 1abb6297f6 more Tensor(UOp) cleanups (#15364) chenyu 2026-03-19 03:34:30 -04:00
  • cf50ca23c3 better oom msg (#15362) nimlgen 2026-03-19 14:07:01 +08:00
  • 1a53393512 remote in ci benchmark (#15344) nimlgen 2026-03-19 13:49:09 +08:00
  • 92dfef8060 Tensor(uop) does not need explicit device (#15361) chenyu 2026-03-19 00:44:33 -04:00
  • f32c2e43a7 memory: use pfree (#15360) nimlgen 2026-03-19 12:39:23 +08:00
  • 86eec01f97 limit gl*lc (#15359) nimlgen 2026-03-19 12:38:55 +08:00
  • b39816e998 failed test case for Tensor(np, "bf16") (#15358) chenyu 2026-03-18 23:40:14 -04:00
  • e407ee410c cosmetic Tensor._do_reduction cleanups (#15357) chenyu 2026-03-18 22:27:50 -04:00
  • 6aebf95dac move neg and invert to mixin (#15356) chenyu 2026-03-18 22:03:41 -04:00
  • f6687d1ffc feat: sd seed0 update (#15354) wozeparrot 2026-03-19 09:42:00 +08:00
  • c45a606750 feat: no if in rand (#15333) wozeparrot 2026-03-19 06:09:51 +08:00
  • 23e0431848 viz: switch sqtt sidebar to a simple asm list (#15350) qazal 2026-03-18 18:40:25 +02:00
  • 709fc52d7b viz: fix auto zoom range in sqtt, include endpgm packet (#15349) qazal 2026-03-18 15:52:32 +02:00
  • d4836ddbb0 canonicalize device from tuple (#15348) nimlgen 2026-03-18 20:35:52 +08:00
  • 5524916e39 llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343) George Hotz 2026-03-18 19:54:40 +08:00
  • ff004d2114 remote: fix mmio (#15347) nimlgen 2026-03-18 18:20:39 +08:00
  • f853371c83 fix compilers autoselect (#15346) nimlgen 2026-03-18 18:19:53 +08:00
  • 761ce8c0d3 fix Invalid combine rules (#15345) chenyu 2026-03-18 04:58:02 -04:00
  • c0499ca3e8 nv: use mmio iface (#15342) nimlgen 2026-03-18 16:53:09 +08:00
  • 499ad9a356 benchmark openpilot 0.11.0 (#15341) Christopher Milan 2026-03-18 00:28:43 -07:00
  • 6e196195d8 add test for flat llama (#15327) George Hotz 2026-03-18 15:16:33 +08:00
  • fceb21c315 Tensor(uop) uses device from uop (#15340) chenyu 2026-03-18 02:56:06 -04:00
  • 6109117af1 anonymous buffers are Invalid (#15336) George Hotz 2026-03-18 14:52:56 +08:00
  • e644e1cb6a less Tensor(...).uop indirection in Tensor.__init__ (#15339) chenyu 2026-03-18 02:17:38 -04:00
  • 0315faf938 remote bench (#15331) nimlgen 2026-03-18 14:03:51 +08:00
  • d720d50e12 memory: traverse all valid ranges only (#15338) nimlgen 2026-03-18 14:03:39 +08:00
  • ac7a348d06 dtypes.as_const -> DType.const (#15337) chenyu 2026-03-18 00:48:41 -04:00
  • 864d3917d5 add openpilot onnx parser test (#15334) Christopher Milan 2026-03-17 21:12:02 -07:00
  • 0222bfdf69 Revert "don't use intermediate dict in onnx parse" (#15332) Christopher Milan 2026-03-17 20:46:30 -07:00
  • 94926d00d8 fix rand > uint32.max (#15330) chenyu 2026-03-17 22:00:01 -04:00
  • acdc232d65 fix ttomsa 2026-03-18 01:11:58 +00:00
  • 0dc615b588 Merge remote-tracking branch 'upstream/master' into new_x86_backend ttomsa 2026-03-18 00:52:27 +00:00
  • 449c79ada2 deal with flags correctly ttomsa 2026-03-18 00:49:05 +00:00
  • b45edeb965 fix: rand supports large tensors (#15329) wozeparrot 2026-03-18 06:45:41 +08:00
  • 00817cf65e viz: all tests can run on the NULL device (#15328) qazal 2026-03-17 21:14:20 +02:00
  • 2605840ee2 flat llama (#15324) George Hotz 2026-03-17 19:39:55 +08:00
  • 0a641ce17d system: remote (#15318) nimlgen 2026-03-17 19:25:37 +08:00