Commit Graph

  • 48a7627b04 add RDNA4 support to copy WMMA (#15663) George Hotz 2026-04-09 22:48:20 +08:00
  • 6837881b06 remove same_shape_noop [pr] (#15662) chenyu 2026-04-09 09:50:26 -04:00
  • 1568c92f7d cleanups x86_moves George Hotz 2026-04-09 20:43:24 +08:00
  • 5d09363b5f simpler George Hotz 2026-04-09 20:12:38 +08:00
  • c012f9c5a7 move define -> regalloc George Hotz 2026-04-09 20:06:54 +08:00
  • 934c0c5797 const George Hotz 2026-04-09 19:42:33 +08:00
  • c559c29d0b works George Hotz 2026-04-09 19:15:05 +08:00
  • e528bc389b move x86 stuff to correct places George Hotz 2026-04-09 18:57:19 +08:00
  • bd6d7e22ce c.Struct cleanup (#15640) Christopher Milan 2026-04-08 17:07:16 -07:00
  • fb40e711dd viz/cli: add pmc printer (#15651) qazal 2026-04-09 02:50:54 +03:00
  • 62a7b84aba fix merge_reduce_ends (#15659) chenyu 2026-04-08 17:20:01 -04:00
  • d08c76d9cb c.Struct cleanup (#15640) Christopher Milan 2026-04-08 17:07:16 -07:00
  • 742b3894d7 viz/cli: add pmc printer (#15651) qazal 2026-04-09 02:50:54 +03:00
  • 4cf2759fc8 fix merge_reduce_ends (#15659) chenyu 2026-04-08 17:20:01 -04:00
  • a17988a52d add callee saved registers ttomsa 2026-04-08 21:00:37 +01:00
  • 12f073e137 Merge branch 'master' into new_x86_backend ttomsa 2026-04-08 20:29:59 +01:00
  • cb681da840 move UOp.pad to mixin (#15657) chenyu 2026-04-08 13:15:19 -04:00
  • 28b14b0e38 mlx: remove to_be, use helpers (#15655) nimlgen 2026-04-08 20:07:28 +03:00
  • 1b44cb2ac6 split update stat from execitem (#15654) nimlgen 2026-04-08 20:07:12 +03:00
  • 71c83cc3f6 viz: put OTHER_ on the wave row (#15650) qazal 2026-04-08 17:13:44 +03:00
  • 839d37b7bc update median_step_time in model_train.py (#15649) chenyu 2026-04-08 09:53:59 -04:00
  • dae9dea903 clean up tensor random functions (#15648) chenyu 2026-04-08 09:44:37 -04:00
  • 1ebeb52e59 RDNA4 asm gemm (#15427) George Hotz 2026-04-08 21:26:44 +08:00
  • b1e52ba0c2 the slowest line in hcq graph (#15635) nimlgen 2026-04-08 15:53:52 +03:00
  • 3ac16b3bea viz: add wmma row, update exec duration logic (#15646) qazal 2026-04-08 14:24:23 +03:00
  • 35e3983840 Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644) George Hotz 2026-04-08 17:16:19 +08:00
  • 39a029ec55 remove ASM_GEMM context var (#15645) qazal 2026-04-08 12:02:40 +03:00
  • dc6a51e44d viz: add # of bytes to sdma (#15639) qazal 2026-04-08 11:43:37 +03:00
  • 70dbd35023 llama: move custom_kernel into flat_llama (#15643) wozeparrot 2026-04-08 15:19:14 +08:00
  • bcf6931a4f fix: comma 4 does not have pcie (#15642) Christopher Milan 2026-04-07 20:57:03 -07:00
  • f930579b7a llm: change the default port to 8000 so you can remember it (match vLLM) George Hotz 2026-04-08 11:25:38 +08:00
  • 35932239d0 Merge branch 'master' into rdna4_gemm rdna4_gemm George Hotz 2026-04-08 10:36:34 +08:00
  • bf3763526a llm: buffer SSE chunks to fix parse errors from split reads (#15641) b1tg 2026-04-08 10:26:23 +08:00
  • a508b8fd2a viz: delete redundant things (#15637) qazal 2026-04-08 01:18:04 +03:00
  • 9c6e925b56 move lerp to mixin (#15634) chenyu 2026-04-07 15:13:00 -04:00
  • 890286e8d6 update llama profile.sh (#15633) qazal 2026-04-07 21:18:45 +03:00
  • b78b384d58 mlx: graph (#15621) nimlgen 2026-04-07 19:43:51 +03:00
  • d29f0ef721 viz: speed up profiler first render (#15632) qazal 2026-04-07 17:07:09 +03:00
  • 29582199c1 Merge branch 'master' into new_x86_backend George Hotz 2026-04-07 21:16:43 +08:00
  • d3de63d998 improvements to apps.llm (#15631) George Hotz 2026-04-07 20:34:05 +08:00
  • 2b01ca59dd USB driver for custom ASM firmware (#15597) George Hotz 2026-04-07 13:45:41 +08:00
  • 810d7c00cd llama: unify scripts (#15628) wozeparrot 2026-04-07 11:28:08 +08:00
  • 19e96497ee interface in DEV (#15620) Christopher Milan 2026-04-06 16:59:28 -07:00
  • 8ba58304f7 viz: reenable tests (#15626) qazal 2026-04-07 01:52:44 +03:00
  • 14be3279c1 Merge branch 'master' into new_x86_backend ttomsa 2026-04-06 23:29:16 +01:00
  • 2f7d085450 shared _normalize_indices for getitem (#15625) chenyu 2026-04-06 17:45:36 -04:00
  • 66ec188d50 more activations to mixin (#15624) chenyu 2026-04-06 15:41:41 -04:00
  • 1483f7e71c support shift by Tensor (#15623) chenyu 2026-04-06 15:14:57 -04:00
  • 6e30a5f5ea update shifts in torch backend (#15622) chenyu 2026-04-06 14:08:33 -04:00
  • a444be172d lower fuzz_symbolic_symbolic_div timeout (#15619) chenyu 2026-04-06 12:58:29 -04:00
  • 01b49c8647 support int operand for shifts (#15618) chenyu 2026-04-06 12:32:12 -04:00
  • e2700475cf mlx: cleaner (#15617) nimlgen 2026-04-06 17:49:47 +03:00
  • 2e61817001 fix ttomsa 2026-04-06 04:55:21 +01:00
  • 8868abe830 fix ttomsa 2026-04-06 04:23:15 +01:00
  • 8346332061 fix ttomsa 2026-04-06 02:49:02 +01:00
  • 91bf07e702 Merge branch 'master' into new_x86_backend ttomsa 2026-04-06 02:34:12 +01:00
  • 86c4431d74 add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079) Valtteri Valo 2026-04-06 01:51:38 +03:00
  • ff0c941548 remove redundant iteration and toposort in _deepwalk (#15532) 13Perrius 2026-04-05 15:38:45 -07:00
  • e39cfe685a validate lr, momentum, weight_decay in optimizers (#15576) Andrew Cappelli 2026-04-05 18:37:34 -04:00
  • 6a334ceb27 hotfix: fix bert (#15613) nimlgen 2026-04-05 23:41:21 +03:00
  • e3986a6b74 mlx: init runtime (#15612) nimlgen 2026-04-05 22:52:29 +03:00
  • e0988dbae5 hcq: support non for signal_t and compute_t (#15611) nimlgen 2026-04-05 18:56:47 +03:00
  • 5e134aa087 hcq: add write/poll_bit commands (#15610) nimlgen 2026-04-05 18:09:44 +03:00
  • 604cdbf2f7 am: large allocs aligned to 2mb to use 2mb pages (#15609) nimlgen 2026-04-05 18:01:31 +03:00
  • b2d5b29f45 assembly/amd: validate dsl keyword args (#15608) qazal 2026-04-05 17:00:24 +03:00
  • 056fcd7758 viz: web work from rdna4 gemm (#15607) qazal 2026-04-05 13:14:16 +03:00
  • 7e54992bf6 fp8 llama (#15588) wozeparrot 2026-04-05 09:24:57 +08:00
  • 891807a1b9 Merge remote-tracking branch 'upstream/master' into new_x86_backend ttomsa 2026-04-05 00:05:11 +01:00
  • 4d36366717 assembly/amd: match rdna4 hw gidx init in emulator (#15604) qazal 2026-04-04 20:28:18 +03:00
  • 2ba5a6ddc8 remove detach in selu (#15602) chenyu 2026-04-04 11:04:29 -04:00
  • f7aed180e4 viz/cli: add Other row in profiler (#15600) qazal 2026-04-04 16:40:53 +03:00
  • 74ecf6d3e6 opaque structs are also c.Struct (#15596) Christopher Milan 2026-04-03 16:40:43 -07:00
  • 645d45d968 DEV has arch (#15577) Christopher Milan 2026-04-03 16:17:19 -07:00
  • 902edc3781 hcq: hcqbuf in copy (#15595) nimlgen 2026-04-03 22:47:36 +03:00
  • 2c4271209e hcq: peer groups for remote (#15594) nimlgen 2026-04-03 19:03:07 +03:00
  • 8fdef2d3e4 mean/std/var to mixin (#15593) chenyu 2026-04-03 10:42:41 -04:00
  • 9920b42b5e hotfix: renderer.target.arch in disasm (#15592) qazal 2026-04-03 16:23:51 +03:00
  • 237084b276 remote: support several hosts (#15585) nimlgen 2026-04-03 11:22:15 +03:00
  • 0ed8d9271d Renderers accept Target or nothing (#15590) Christopher Milan 2026-04-02 22:09:41 -07:00
  • 3a26920141 feat: framework ci (#15589) wozeparrot 2026-04-03 13:03:51 +08:00
  • 830a147a52 Revert "good stuff in USB" fancy_usb George Hotz 2026-04-03 12:19:57 +08:00
  • 0b0ea63439 Merge remote-tracking branch 'upstream/master' into new_x86_backend ttomsa 2026-04-03 04:12:01 +01:00
  • 736fea8412 select_first_inited cleanup and better errors (#15587) Christopher Milan 2026-04-02 16:27:58 -07:00
  • 8c50da800d [pr] cleanup unused ctx's in codegen (#15586) Christopher Milan 2026-04-02 16:06:58 -07:00
  • 694dc5a717 install script in benchmark (#15584) nimlgen 2026-04-02 18:15:58 +03:00
  • 046c3f1240 mlx: add loopback with send/recv (#15583) nimlgen 2026-04-02 18:15:46 +03:00
  • c64226e97c fix CreationMixin doc (#15582) chenyu 2026-04-02 09:46:28 -04:00
  • d8c2836099 good stuff in USB George Hotz 2026-04-02 18:34:03 +08:00
  • fefb0ebc2a gemm/asm: fp8 cleanups (#15580) qazal 2026-04-02 13:02:38 +03:00
  • 61bc91aa8c Tensor cumalu cleanups (#15579) chenyu 2026-04-02 05:23:22 -04:00
  • 4c654024bc good stuff in USB George Hotz 2026-04-02 11:23:30 +08:00
  • 1aa04eab08 simple CreationMixin (#15567) chenyu 2026-04-01 23:00:56 -04:00
  • 5b2a3251c4 mlperf system json for mi350 (#15575) wozeparrot 2026-04-02 06:30:33 +08:00
  • 6c67bd4c14 better error message when invalid renderer is specified (#15573) Christopher Milan 2026-04-01 14:12:55 -07:00
  • 0d6fbc2355 remove flaky and redundant image test (#15574) Christopher Milan 2026-04-01 13:33:13 -07:00
  • 20f7f0be8e nir renderers use arch (#15556) Christopher Milan 2026-04-01 13:32:51 -07:00
  • 148ad09559 am: do not use dbell for ih (#15571) nimlgen 2026-04-01 21:34:21 +03:00
  • 93a85c7348 am: raise when using more sdma engines (#15569) nimlgen 2026-04-01 21:33:42 +03:00
  • da12c2ea16 better install msg (#15570) nimlgen 2026-04-01 20:09:37 +03:00
  • 20497f2840 fold BIND to CONST when min==max (#15568) b1tg 2026-04-01 23:19:04 +08:00