Commit Graph

1299 Commits

Author SHA1 Message Date
George Hotz
770a558585 lil cleanups from uop branch [pr] (#11197) 2025-07-12 09:46:28 -07:00
nimlgen
ea7f2f779c hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
nimlgen
6f5250d158 nv: fix typing in rpc_rm_control (#11189) 2025-07-12 16:09:42 +03:00
uuuvn
d11b20129d DMARef infra (#10753)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-11 14:09:47 -07:00
nimlgen
f9e4c4e57a nv: nvpci blackwell support (#11127)
* nv: start 5090

* gsp init 5090

* mmu

* works

* after merge

* clenaer

* rwk

* x

* fx

* finish?

* fix

* unrelated

* fix

* commenbt
2025-07-11 17:02:09 +03:00
nimlgen
c7f6b617b4 nv: do not hardcode lv0 pd size (#11180) 2025-07-11 16:26:18 +03:00
nimlgen
27922c986a nv: generic mmu impl (#11179) 2025-07-11 16:26:09 +03:00
nimlgen
cc6ed30f4f nv: relative lv addressing in NVPageTableEntry (#11164) 2025-07-10 22:35:50 +03:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
nimlgen
581397110f nv: use classes in GSP_IP (#11163) 2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6 nv: parse sizes of ctx buffers (#11161) 2025-07-10 17:46:48 +03:00
Pyry Kovanen
32117402dd metal: fix incorrect _free on interpreter exit (#11158) 2025-07-10 14:01:30 +03:00
George Hotz
53ae153404 tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d initial gfx950 kfd support (#11151)
* feat: initial gfx950 support

* fix: lint
2025-07-09 13:45:16 -07:00
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
George Hotz
22305260e0 move tc to tc.py [pr] (#11147) 2025-07-09 10:55:56 -07:00
nimlgen
43650169f4 nv: switch headers to 570.144 to match gsp (#11131) 2025-07-08 20:29:06 +03:00
nimlgen
b516fe71b4 nv: return real struct in _alloc_boot_struct (#11130) 2025-07-08 20:04:43 +03:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
nimlgen
71377cd233 nv: parse falcon app descs (#11118) 2025-07-07 18:14:14 +03:00
nimlgen
9a573a1d99 nv: finalize nvdev (#11117)
* nv: finalize nvdev

* typo
2025-07-07 16:31:59 +03:00
nimlgen
fa59c05282 nv: import flags from system (#11115)
* nv: import flags from system

* not used
2025-07-07 14:46:49 +03:00
nimlgen
b73e89110e nv: align allocations for perf (#11114) 2025-07-06 22:32:11 +03:00
nimlgen
577afc9f05 hcq: remove redunt syncs and fix typing (#11096)
Before this patch the code could issues reduntdant syncs because of
the typing issue. Current tests should cover all correctness checks.
2025-07-04 21:49:47 +03:00
nimlgen
6656aa162c nv: enable huge pages (#11091) 2025-07-04 17:17:24 +03:00
nimlgen
01f3c4f44d memory: simpler paddr allocation logic (#11090)
* memory: new paddr allocation logic

* am fix

* am refactrros

* fix

* mypy

* use it

* am
2025-07-04 17:00:36 +03:00
nimlgen
e02ee8ef1b nv: cleanups from 5090 (#11081) 2025-07-04 00:08:47 +03:00
nimlgen
2d138c6cf1 am: factor out init_sw (#11070) 2025-07-03 11:01:17 +03:00
quortus
17d85b9793 Refactor STORE implementation in ops_python (#11060) 2025-07-02 14:29:12 -07:00
nimlgen
6067568087 nv: remove hardcoded CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES (#11057) 2025-07-02 20:41:10 +03:00
Ahmed Harmouche
e992ed10dc WebGPU on Windows (#10890)
* WebGPU on Windows

* Fix dawn-python install

* New test

* pydeps

* Minor fix

* Only install dawn-python on windows webgpu

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-02 08:38:45 -07:00
nimlgen
e67a6d2310 nv: tiny cleanups (#11053) 2025-07-02 18:37:32 +03:00
b1tg
fcbefde8f5 fix DiskDevice reuse (#11039)
* fix DiskDevice reuse

* fix mypy and DiskDevice.count

* mypy

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-07-01 10:29:21 -04:00
nimlgen
9ea7deb515 hcq: select_iface shared (#11033)
* hcq: select_iface shared

* errs

* sorry

* upprt
2025-06-30 21:12:39 +03:00
chenyu
126fcf4129 clean up AMD_LLVM in tests (#11021) 2025-06-28 22:45:47 -04:00
nimlgen
e53673a0b2 amd: sdma queue overrun fix (#11012)
* amd: sdma queue overrun fix

* add ()

* fix

* bug

* this is correct
2025-06-28 01:42:03 +03:00
George Hotz
be53ef4f0a rename DEFINE_ACC -> DEFINE_REG (#11006)
* rename DEFINE_ACC -> DEFINE_REG

* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
nimlgen
1c45b9f7fb start nvpci (#10521)
* start nvpci

* talk to fsp

* boot args

* riscv core bootted

* q

* agen

* got gsp init msg

* some fixes

* set registry, stuck aft lockdown(

* start ga/ad port

* gsp init on ada

* more classes allocated

* more

* mm

* fixes and progress

* no huge pages for now

* mm seems workin, but switch to 512mb page for simplicity

* working state

* not cleaned

* claned

* nvd=1

* start gr ctx

* compute

* clean 1

* cleanup 2

* cleanup 3

* cleaner 4

* cleaner 6

* add iface to nv

* save before reboot

* merged into NV

* moveout mm

* post merge

* cleaner 7

* merge and rebase

* pciiface abstraction + reset

* download fw from web

* print logs

* minor changes + p2p

* cleaner 8

* cleaner 9

* cleaner 10

* delete

* delete this as well

* linter 1

* oops

* priv_client -> priv_root

* fix mypy

* mypy?

* mypy?

* small changes

* shorter

* ops

* remove this

* do not allocate paddr for reserve

* nodiff

* unified script

* ops

* dif ver

* add lock

* setup
2025-06-25 00:37:34 +03:00
uuuvn
c8d0f68763 Weaker renderer validation in remote (#10964)
```
training bert
training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5']
Traceback (most recent call last):
  File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module>
    with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
                                                 ^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert
    for x in GPUS: Device[x]
                   ~~~~~~^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__
    def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item
    ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__
    if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}")
                                                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ())
```
2025-06-24 14:15:09 -07:00
nimlgen
26ddf8d714 amd: rename dev_iface -> iface to match nv (#10959) 2025-06-24 20:22:19 +03:00
nimlgen
c0d9cf09e0 system: flock (#10949)
* system: flock

* imports

* xx
2025-06-24 11:33:49 +03:00
nimlgen
5202970feb system: move memory_barrier to System (#10948)
* system: move memory_barrier to System

* fixed
2025-06-24 11:09:43 +03:00
George Hotz
0f89660ce4 Revert "change clang -march flag to -mcpu on arm (#10841)" (#10942)
This reverts commit 897e42fd1b.
2025-06-23 16:48:28 -07:00
ttomsa
897e42fd1b change clang -march flag to -mcpu on arm (#10841)
* change clang -march flag to -mcpu with fp16 disassembly test

* fix

* add capstone to macos dependencies

* just check no cast in test

* rm import

* woops

* lets check

* move check

* llvm init before cpu chcek

* try this

* bump autogen llvm version

* also update libclang?

* revert

* add comment

* skip llvm test and add comment

* linter
2025-06-23 16:28:48 -07:00
uuuvn
4e2c9e36c7 Remote multihost (p2p transfer) (#10601) 2025-06-23 11:47:29 -07:00
nimlgen
eceb7a00d2 nv: rename iface mem functions (#10931) 2025-06-23 16:34:51 +03:00
nimlgen
3ccdb2356b system: factor out PCIIfaceBase (#10917)
* system: factor out PCIIfaceBase

* linter

* typing
2025-06-22 20:03:14 +03:00
nimlgen
36536ef6f0 nv: minor changes from nvpci (#10918) 2025-06-22 18:04:39 +03:00
nimlgen
0e7bd9fd03 factor out generic MemoryManager (#10910)
* allocator -> memory

* just moveout it

* mm is abstracted

* need entry abstraction

* fix

* mypy
2025-06-21 16:18:33 +03:00
nimlgen
bb0299b9e5 system: shared pci logic (#10894)
* moveout pci logic

* fixes

* oops

* types

* more type

* one style

* thi is imp
2025-06-21 00:09:49 +03:00