Commit Graph

  • 842c978df3 remove staticmethod dtypes.max/min (#15227) chenyu 2026-03-11 23:11:24 -04:00
  • 18dc77ccab add fp8 fnuz dtypes with PYTHON backend support (#14945) b1tg 2026-03-12 10:30:18 +08:00
  • 4f3f55328b do not patch on invalid tensor tests (#15226) George Hotz 2026-03-12 09:35:20 +08:00
  • b384c27314 ln ranged_call George Hotz 2026-03-12 08:23:06 +08:00
  • 5ed68aeed5 cleanups George Hotz 2026-03-12 08:18:31 +08:00
  • f682af2a31 better uop matmul George Hotz 2026-03-12 08:13:09 +08:00
  • 4fab320abe llama: clean (#15224) wozeparrot 2026-03-12 04:33:59 +08:00
  • 05d6d9120a llama offload null (#15222) wozeparrot 2026-03-12 01:04:31 +08:00
  • d3eef70162 viz: render shader clock frequency graph (#15197) qazal 2026-03-11 18:32:49 +02:00
  • 62dbf12655 call matmul George Hotz 2026-03-11 21:50:31 +08:00
  • 39b0f4bcc1 remove Ops.THREEFRY in remove_bufferize [pr] (#15220) chenyu 2026-03-11 05:30:33 -04:00
  • 6489a6f212 Revert "remove mop_cleanup [pr] (#15217)" (#15218) chenyu 2026-03-11 04:17:56 -04:00
  • 6b50df940a remove mop_cleanup [pr] (#15217) chenyu 2026-03-11 03:54:42 -04:00
  • b17e15d1aa support ranges on call George Hotz 2026-03-11 15:22:46 +08:00
  • 2fb8a7f60f fix test_invalid_tensor when before values are nan (#15215) Christopher Milan 2026-03-10 20:51:19 -07:00
  • fce87f19a8 better fold_add_divmod_recombine (#15214) chenyu 2026-03-10 23:24:22 -04:00
  • df8deec949 test for nest_by_factor selection (#15213) chenyu 2026-03-10 22:41:31 -04:00
  • be6b0bce1f variations of (x%c)+(x//c)*c (#15212) chenyu 2026-03-10 22:41:14 -04:00
  • a408d90f4f viz: always detect sqtt packet overlaps, add timeline tests (#15211) qazal 2026-03-10 22:32:38 +02:00
  • d9c7290eb0 nv: nvdec as NVDEC:0 device (#15209) nimlgen 2026-03-10 14:44:50 +03:00
  • 25d86ec9e1 start using Invalid in image_conv2d (#15208) Christopher Milan 2026-03-10 04:11:06 -07:00
  • ecbddfcffe clean up gcd_with_remainder [pr] (#15207) chenyu 2026-03-10 06:13:20 -04:00
  • bb7888b281 cleanup (x%(k*c))//c and (x%(k*c))%c (#15206) chenyu 2026-03-10 05:21:32 -04:00
  • 8389a8d7c5 remove_nested_mod can work with negative (#15205) chenyu 2026-03-10 03:10:08 -04:00
  • ffaafd391a Invalid in Tensor (#15154) Christopher Milan 2026-03-09 23:49:54 -07:00
  • 68c7c3ca84 divmod test_gcd_with_remainder (#15204) chenyu 2026-03-09 23:51:47 -04:00
  • a53187eef7 fix TestPartialAssignToSharedBuffer (#15202) chenyu 2026-03-09 23:14:23 -04:00
  • 525a178966 llama: jit more (#15199) wozeparrot 2026-03-10 11:04:59 +08:00
  • 221eafcd8d fix ttomsa 2026-03-10 01:41:23 +00:00
  • 315ad50a1a make late allreduce the default (#15125) George Hotz 2026-03-10 08:42:57 +08:00
  • 7115ed0c22 fix ttomsa 2026-03-10 00:21:04 +00:00
  • 1a52341196 fix ttomsa 2026-03-10 00:10:24 +00:00
  • 6b354b906d fold_divmod_general cleanups [pr] (#15196) chenyu 2026-03-09 19:43:16 -04:00
  • 037c5e6f82 Merge remote-tracking branch 'upstream/master' into new_x86_backend ttomsa 2026-03-09 23:33:52 +00:00
  • aaab4407af a lot better ttomsa 2026-03-09 23:29:49 +00:00
  • 02ceeab3a7 viz: ui cleanups from the sqtt real time branch (#15195) qazal 2026-03-09 22:33:53 +02:00
  • a615ed8ebe sqtt: update RDNA timestamp marker fields (#15194) qazal 2026-03-09 22:18:47 +02:00
  • e2e69dfe51 Merge branch 'master' into default_late_allreduce default_late_allreduce wozeparrot 2026-03-10 01:10:25 +08:00
  • 8bd6d270c5 rm ops.encdec (#15193) nimlgen 2026-03-09 18:52:48 +03:00
  • 81ab499b4b viz: small ui code cleanups (#15192) qazal 2026-03-09 14:17:33 +02:00
  • 60215deb60 tiebreak in fold_divmod_congruence (#15190) chenyu 2026-03-09 03:40:39 -04:00
  • a8d8351e5a match IDIV and MOD in nest_by_factor (#15188) chenyu 2026-03-09 00:50:38 -04:00
  • 7592622562 fix QCOMCLRenderer pickle (#15189) Christopher Milan 2026-03-08 21:36:16 -07:00
  • 2bb0970512 QCOM CL compiler prints LLVMIR when DEBUG>=8 (#15187) Christopher Milan 2026-03-08 21:15:20 -07:00
  • 83b80da8f3 even more divmod recombine (#15163) chenyu 2026-03-08 23:52:26 -04:00
  • 82f7734501 use backward_slice in reduce_mul_chain [pr] (#15186) chenyu 2026-03-08 21:44:53 -04:00
  • 25e82a9aca viz: exclude redundant traceback from SDMA (#15185) qazal 2026-03-08 22:12:14 +02:00
  • 6ac99fd4c9 memplanner opt copy bufs (#15110) nimlgen 2026-03-08 22:28:01 +03:00
  • 633264feae am: flush sdma pipeline (#15184) nimlgen 2026-03-08 20:27:56 +03:00
  • 891a73befc llm: fix chunked prefill (#15182) b1tg 2026-03-07 22:08:31 +08:00
  • 5d58b1c396 don't use intermediate dict in onnx parse (#15181) chenyu 2026-03-07 00:08:03 -05:00
  • 37a40bf975 early lower cat cat_mop George Hotz 2026-03-07 11:39:20 +08:00
  • af1db22b25 simpler George Hotz 2026-03-07 10:11:21 +08:00
  • be0f9d1055 min George Hotz 2026-03-07 10:00:29 +08:00
  • 086081e35b tbgpu: add stapler to the script (#15180) nimlgen 2026-03-07 00:07:27 +03:00
  • a03f512147 viz: clean up old / unused paths in sidebar rendering (#15179) qazal 2026-03-06 22:36:10 +02:00
  • 605b37c03f use backward_slice in count_divmod [pr] (#15178) chenyu 2026-03-06 14:03:53 -05:00
  • 5bdad8ee41 update mxfp4 tests to use the same patterns as the others (#15177) Ananta Ranganathan 2026-03-06 10:21:40 -08:00
  • d85109f9f7 viz: walk PROGRAM UOp back to source and binary only (#15174) qazal 2026-03-06 18:39:07 +02:00
  • 5c50035e0d avoid using arithmetic for mxfp4 (#15172) Ananta Ranganathan 2026-03-06 08:17:56 -08:00
  • f064db0ac6 viz: later tooltip rendering (#15170) qazal 2026-03-06 16:00:15 +02:00
  • 4ed8bb7445 tie break for divmod (#15169) Roelof van Dijk 2026-03-06 14:05:38 +01:00
  • 83f1faa142 sqtt: update CDNA wave packet field, start unskipping tests (#15168) qazal 2026-03-06 14:37:44 +02:00
  • 7810be8d3c compile QCOM without opening device (#15165) Christopher Milan 2026-03-06 03:24:27 -08:00
  • 5b9a6c5520 Add Ops.CAT movement op (ai slop) George Hotz 2026-03-06 18:25:12 +08:00
  • 6fd18ef875 rename CAT to VCAT (#15167) George Hotz 2026-03-06 18:46:28 +08:00
  • 465be0d333 Merge branch 'master' into new_x86_backend ttomsa 2026-03-06 00:32:35 +00:00
  • dd5076529b rm this for now ttomsa 2026-03-06 00:32:17 +00:00
  • 41f2bd8a05 more isel tests ttomsa 2026-03-06 00:28:36 +00:00
  • 059c6326c0 metal uint32 icb offset overflow (#15156) Roelof van Dijk 2026-03-05 22:54:39 +01:00
  • 255a788dea enable vector load/store on all dtypes ttomsa 2026-03-05 19:57:07 +00:00
  • da61088ca4 more divmod recombine (#15162) chenyu 2026-03-05 12:53:22 -05:00
  • 167a1d56a6 improve divmod folding (#15148) chenyu 2026-03-05 10:07:36 -05:00
  • b824579e4d simplify image_conv2d pitch alignment hacks (#15158) Christopher Milan 2026-03-05 04:17:34 -08:00
  • 5bf542469d viz: python traceback for USER device (#15160) qazal 2026-03-05 13:22:09 +02:00
  • d65923bda5 tensor.py: add normalize function (#15159) Roelof van Dijk 2026-03-05 11:55:53 +01:00
  • d7c915874a Merge branch 'master' into default_late_allreduce wozeparrot 2026-03-05 17:53:11 +08:00
  • 4544da1c54 llama3 fixes part3 (#15152) wozeparrot 2026-03-05 17:17:54 +08:00
  • fc0534910c q5k is like q4k (#15155) Roelof van Dijk 2026-03-05 10:02:49 +01:00
  • 8ef656324e FIXED TEST Q5_K GGUF dequant (#15147) Ananta Ranganathan 2026-03-05 00:32:36 -08:00
  • e97922a57c LLM speedup with two jits, prefill/rollout (#15153) George Hotz 2026-03-05 16:21:09 +08:00
  • be23772d43 llama3 fixes part2 (#15150) wozeparrot 2026-03-05 15:43:50 +08:00
  • 0c769289eb llama3: more scripts (#15107) wozeparrot 2026-03-05 14:18:03 +08:00
  • fb43b415f9 fix symbolic shape call + chunked prefill (#15149) George Hotz 2026-03-05 14:02:26 +08:00
  • 8a82b26522 llm: print the prefill cache size (#15146) George Hotz 2026-03-05 12:13:28 +08:00
  • b5370fd52d use copy_multi in alu_multi [pr] (#15143) chenyu 2026-03-04 22:53:00 -05:00
  • 72a9ed6e23 fix render depth bug + add warmup to serve + no realize default (#15144) George Hotz 2026-03-05 11:21:16 +08:00
  • ac1847cbf7 fully symbolic llm (#15097) George Hotz 2026-03-05 10:22:11 +08:00
  • 33a1970045 sqtt: simplify inst mapping, validate JUMP processing in CI (#15139) qazal 2026-03-05 02:53:12 +02:00
  • 04da527a7a minor div_and_mod_symbolic cleanups (#15138) chenyu 2026-03-04 19:05:44 -05:00
  • 106d18b792 use UOp methods in allreduce.py [pr] (#15137) chenyu 2026-03-04 17:15:33 -05:00
  • 34594bcaaf Revert "bug in metal: offset is stored as uint32, overflow (#15129)" (#15136) chenyu 2026-03-04 16:54:42 -05:00
  • b172b5d72c Merge branch 'master' into new_x86_backend ttomsa 2026-03-04 20:50:26 +00:00
  • 64f574572c regalloc takes renderer ttomsa 2026-03-04 20:49:47 +00:00
  • 9c58db16fa bug in metal: offset is stored as uint32, overflow (#15129) Roelof van Dijk 2026-03-04 20:52:12 +01:00
  • 4cce283790 relax test_tqdm_perf (#15134) chenyu 2026-03-04 12:58:47 -05:00
  • fae400d300 update assign tests to also test the expected behavior (#15132) chenyu 2026-03-04 11:34:43 -05:00
  • 1f96cc2b51 update non-contiguous buffer error message [pr] (#15131) chenyu 2026-03-04 11:13:26 -05:00
  • 563d5c3211 more graph tests (#15130) nimlgen 2026-03-04 19:01:12 +03:00
  • cdc48da9cd hevc: assert and speed (#15122) nimlgen 2026-03-04 19:01:02 +03:00