Commit Graph

  • 4e9b85ecfd fa: pull inputs out of call (#15127) wozeparrot 2026-03-04 19:15:49 +08:00
  • 47faa2d7b4 hotfix: llm kv cache uses clone instead of realize to avoid many realize George Hotz 2026-03-04 19:07:03 +08:00
  • 73331f4e55 print only_assign_buffers George Hotz 2026-03-04 18:55:37 +08:00
  • fe25f98f49 clone forces it to be backed by a real buffer George Hotz 2026-03-04 18:43:09 +08:00
  • 4f91db06d7 work George Hotz 2026-03-04 18:39:48 +08:00
  • 8ebd24637b fix fa forward building with clang 22 (#15124) George Hotz 2026-03-04 18:32:25 +08:00
  • 1a4826f802 only support assign to buffers/after George Hotz 2026-03-04 18:32:06 +08:00
  • 592f9bf6c6 set OPENPILOT_HACKS=1 to enable replace assign (#15123) Christopher Milan 2026-03-04 02:26:04 -08:00
  • 5241ee5a6c fix: override rocm path fix_fa_fwd_clang_22 Woze Parrot 2026-03-04 10:19:47 +00:00
  • 091443349c fix spec George Hotz 2026-03-04 18:05:00 +08:00
  • 6a912250c7 make late allreduce the default George Hotz 2026-03-04 17:49:12 +08:00
  • 138a6b6c40 fix fa forward building with clang 22 George Hotz 2026-03-04 17:42:44 +08:00
  • df23057984 fa: change bwd grid dim + unshuffle using mops (#15068) wozeparrot 2026-03-04 17:23:40 +08:00
  • 5623cea7b1 move openpilot contiguous hacks to schedule (#15120) Christopher Milan 2026-03-04 00:04:06 -08:00
  • 759c7fc81c failing test for allreduce memory usage (#15106) wozeparrot 2026-03-04 15:38:38 +08:00
  • 5ecfe549e7 allreduce is a function with LATE_ALLREDUCE=1 (#15119) George Hotz 2026-03-04 15:17:58 +08:00
  • 4662fb413f works late_allreduce George Hotz 2026-03-04 13:14:16 +08:00
  • ccb5dcf3b8 fix George Hotz 2026-03-04 13:01:35 +08:00
  • 6b82b51759 close George Hotz 2026-03-04 12:59:17 +08:00
  • e7e70a3c95 simplify idx before counting backward_slice (#15117) Christopher Milan 2026-03-03 20:53:50 -08:00
  • 2d72a4a90c fix copying padded const (#15116) George Hotz 2026-03-04 10:39:45 +08:00
  • b5ebb4d06d contiguous_view_offset returns only offset [pr] (#15113) chenyu 2026-03-03 15:23:39 -05:00
  • abd830b260 am: setup_rinf returns only doorbell (#15112) nimlgen 2026-03-03 19:27:41 +03:00
  • 4b42bb54aa am: reset sdma to start from 0 (#15109) nimlgen 2026-03-03 18:14:46 +03:00
  • 01ddb4c267 add precompile to call (#15099) George Hotz 2026-03-03 22:32:42 +08:00
  • c7f908b788 sqtt: fix rdna4 structs (#15111) qazal 2026-03-03 16:32:14 +02:00
  • 8dd691761d sqtt: remove old files (#15108) qazal 2026-03-03 15:43:24 +02:00
  • de043226ba benchmark comma usbgpu driving_vision step and load time (#15103) Christopher Milan 2026-03-03 03:08:03 -08:00
  • 5f6b610da1 FLOAT16 logic for IMAGE==1 goes back to image_conv2d (#15105) Christopher Milan 2026-03-03 02:37:57 -08:00
  • 529318259c fix: fix null tests to actually use null device (#15104) wozeparrot 2026-03-03 18:05:47 +08:00
  • 7d025089e3 no after removal (#15102) George Hotz 2026-03-03 17:50:31 +08:00
  • 92c16810ac feat: per device mem_used (#15100) wozeparrot 2026-03-03 17:31:28 +08:00
  • e3a0598d0b viz: the whole pc should be in view (#15101) qazal 2026-03-03 10:17:53 +02:00
  • a9ea36de79 assembly/amd: v_cmp_lg_f32 is ordered not-equal (#14982) b1tg 2026-03-03 15:37:48 +08:00
  • c35de9bd68 asm_gemm: support more sharding (#15002) wozeparrot 2026-03-03 15:16:37 +08:00
  • 824ba4386a llama3 dp fix (#15098) wozeparrot 2026-03-03 14:43:07 +08:00
  • 5dcf29b1a0 use clone in test_swap_slices (#15096) chenyu 2026-03-02 22:05:12 -05:00
  • c70e8af068 move IMAGE FLOAT16 logic to allocations (#15095) Christopher Milan 2026-03-02 19:00:05 -08:00
  • d483e4153a buffer view is like buffer (#15082) George Hotz 2026-03-03 09:52:33 +08:00
  • 62ee976c1b gemm/asm: cleanup repeated patterns to helper functions (#15094) qazal 2026-03-03 01:14:47 +02:00
  • 848f5cea96 viz: sqtt instruction packet trace (#15065) qazal 2026-03-03 00:55:04 +02:00
  • 3dc9bbd831 vpsrldq can't access memory ttomsa 2026-03-02 22:16:36 +00:00
  • cafa3b74d4 rm bad rewrite ttomsa 2026-03-02 21:32:52 +00:00
  • 393e591f49 Merge branch 'master' into new_x86_backend ttomsa 2026-03-02 21:14:07 +00:00
  • 82954c7ca4 support float16 vector load/store ttomsa 2026-03-02 21:13:07 +00:00
  • 14d1c5fdfd assign fusion tests on detach and contiguous_backward (#15092) chenyu 2026-03-02 15:21:51 -05:00
  • dfa180413d tbgpu: sign nv (#15087) nimlgen 2026-03-02 22:58:30 +03:00
  • 71f228f80f test exact kernel count in torch_backend/test_kernel_fusion (#15091) chenyu 2026-03-02 14:26:32 -05:00
  • f80b1033c5 simpler Tensor.all (#15089) chenyu 2026-03-02 11:08:55 -05:00
  • 4008f7d4e8 move Tensor.one_hot +1 to python (#15088) chenyu 2026-03-02 10:56:41 -05:00
  • dafbe9733a am: cleanup (#15086) nimlgen 2026-03-02 17:06:21 +03:00
  • f7aeff6061 viz: cli.py cleanups, do not require PYTHONPATH (#15085) qazal 2026-03-02 12:24:38 +02:00
  • 5ff278446c add contiguous_view_offset (#15084) George Hotz 2026-03-02 18:05:04 +08:00
  • 977c270774 IMAGE=1 kernel count failing tests (#15083) Christopher Milan 2026-03-02 01:35:26 -08:00
  • 3539693555 Support triu variable on diagonal + SDPA symbolic (#15081) George Hotz 2026-03-02 12:19:48 +08:00
  • a4f6365929 llama3: fstep takes grads (#15069) wozeparrot 2026-03-02 12:05:07 +08:00
  • 8e8e9f6ff6 assert removal for _tri() + tests (#15073) Nick 2026-03-01 21:34:28 -05:00
  • 4fb38e94ad symbolic llm with prefill (ai slop) sym_llm George Hotz 2026-03-02 02:13:42 +00:00
  • ccbbca05ef beam: add dev_timeout for am (#15063) nimlgen 2026-03-01 16:57:29 +03:00
  • 8cb4368967 delete unused END NOOP rule [pr] (#15077) chenyu 2026-03-01 00:09:05 -05:00
  • efce99adc9 skip isComposing key press in llm.py (#15076) chenyu 2026-02-28 20:31:53 -05:00
  • 103ea16ec0 add contiguous back to svd (#15074) chenyu 2026-02-28 16:49:26 -05:00
  • fe0fa8333b Revert "improve Tensor.sort indices (#15070)" (#15072) chenyu 2026-02-28 14:40:30 -05:00
  • e3003631f2 improve Tensor.sort indices (#15070) chenyu 2026-02-28 14:16:16 -05:00
  • cfc5cf65ad llama3: vocab padding fix + jit copies on fakedata (#15067) wozeparrot 2026-03-01 00:44:55 +08:00
  • 76170d035a relax atol for test_xlm_roberta_large (#15066) chenyu 2026-02-28 11:22:35 -05:00
  • cfb8e6922d viz: arrow keys move through time (#15064) qazal 2026-02-28 16:52:36 +02:00
  • 9b3450c9da test gpu crash on cdna (#15062) nimlgen 2026-02-28 13:17:59 +03:00
  • 6bbf813dd3 ci: switch to tinygrad/amdcomgr_dylib (#15061) nimlgen 2026-02-28 13:09:39 +03:00
  • 77846300b2 am: reset vm fault (#15060) nimlgen 2026-02-28 12:58:56 +03:00
  • dc54441e1f add better printing to tinygrad.apps.llm (#15059) George Hotz 2026-02-28 16:38:50 +08:00
  • bb84e389cf functions for llama trainer (#15045) George Hotz 2026-02-28 12:15:18 +08:00
  • 9b4ba3f838 remove ReduceContext.range_to_ends [pr] (#15055) chenyu 2026-02-27 22:15:44 -05:00
  • 151608aa90 update test_multiple_to_single_device (#15056) chenyu 2026-02-27 21:44:33 -05:00
  • eb4ad1ebf0 move max/min and add test ttomsa 2026-02-27 22:52:50 +00:00
  • ce2c690721 Merge branch 'master' into new_x86_backend ttomsa 2026-02-27 22:40:28 +00:00
  • a982a8709e canonical max ttomsa 2026-02-27 22:38:16 +00:00
  • 5fd06f4f02 differentiable setitem (#15054) chenyu 2026-02-27 17:25:15 -05:00
  • 94c317a437 print correct reg names ttomsa 2026-02-27 22:23:11 +00:00
  • db6b3e1edc fix mixed setitem with both basic and tensor indexing (#15050) chenyu 2026-02-27 15:35:48 -05:00
  • c9f6d8751b don't remove_bufferize for Invalid (#15053) chenyu 2026-02-27 15:16:09 -05:00
  • b8a55d5f68 sqtt: new packet types, add discovery script (#14960) qazal 2026-02-27 21:27:27 +02:00
  • 4e12fc3fe6 am: mi3xx recovery (#15051) nimlgen 2026-02-27 22:10:47 +03:00
  • 81a35cef38 rearrange Tensor.getitem code (#15049) chenyu 2026-02-27 12:57:16 -05:00
  • 1406d49eef failed test cases for advanced setitem (#15048) chenyu 2026-02-27 10:50:18 -05:00
  • ef1017f7ed viz: skip drawing offscreen tracks in profiler (#15047) qazal 2026-02-27 15:19:08 +02:00
  • ad99b77f6d assembly/amd: add gfx12_asm_vflat llvm tests, disasm fixes (#15046) qazal 2026-02-27 13:20:31 +02:00
  • 010d2790ce fix multi minimal (#15044) George Hotz 2026-02-27 14:31:58 +08:00
  • 3e1e12528c hotfix: disable tinyfs load test George Hotz 2026-02-27 12:04:41 +08:00
  • d23b79530e remove disk from GGUF GEMV test (#15041) George Hotz 2026-02-27 12:03:00 +08:00
  • d345f7f5dc remove _pending_assigns (#15040) chenyu 2026-02-26 22:38:10 -05:00
  • 37e31e7da4 gguf gemv test (#15039) George Hotz 2026-02-27 10:54:43 +08:00
  • af94bfc401 fix retinanet shared memory race condition in parallel tests (#15030) Nick 2026-02-26 19:36:24 -05:00
  • 2bbf8bbefa improve call/param rendering (#15023) George Hotz 2026-02-27 08:35:04 +08:00
  • 13c4f2fb04 linter ttomsa 2026-02-27 00:22:24 +00:00
  • 0f94a4bb73 failed test case for early fixup const copy (#15038) chenyu 2026-02-26 19:09:33 -05:00
  • ee4455952d Merge branch 'master' into new_x86_backend ttomsa 2026-02-26 23:43:17 +00:00
  • c247bdd9b9 cleanups ttomsa 2026-02-26 23:42:41 +00:00
  • 3a4db53b43 raise RuntimeError in schedule for conflicted var_val [pr] (#15031) chenyu 2026-02-26 15:16:01 -05:00
  • d65db32395 viz: only compute aggregate memory graph, defer n² per buffer graph (#15029) qazal 2026-02-27 03:14:51 +08:00