George Hotz
770a558585
lil cleanups from uop branch [pr] ( #11197 )
2025-07-12 09:46:28 -07:00
nimlgen
ea7f2f779c
hcq: p2p nv-amd ( #11195 )
...
* hcq: p2p between diff devices
* fix
2025-07-12 18:53:34 +03:00
nimlgen
6f5250d158
nv: fix typing in rpc_rm_control ( #11189 )
2025-07-12 16:09:42 +03:00
uuuvn
d11b20129d
DMARef infra ( #10753 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-11 14:09:47 -07:00
nimlgen
f9e4c4e57a
nv: nvpci blackwell support ( #11127 )
...
* nv: start 5090
* gsp init 5090
* mmu
* works
* after merge
* clenaer
* rwk
* x
* fx
* finish?
* fix
* unrelated
* fix
* commenbt
2025-07-11 17:02:09 +03:00
nimlgen
c7f6b617b4
nv: do not hardcode lv0 pd size ( #11180 )
2025-07-11 16:26:18 +03:00
nimlgen
27922c986a
nv: generic mmu impl ( #11179 )
2025-07-11 16:26:09 +03:00
nimlgen
cc6ed30f4f
nv: relative lv addressing in NVPageTableEntry ( #11164 )
2025-07-10 22:35:50 +03:00
qazal
bde80c0cdf
record GraphEvents in metal graph ( #11145 )
...
* record GraphEvents in metal graph
* add TestProfiler.test_graph, revert old stuff
* move profile capture to MetalGraph
* comment
* don't double record graph command buffers
* wait_check
* explicit delete
2025-07-10 21:32:06 +03:00
nimlgen
581397110f
nv: use classes in GSP_IP ( #11163 )
2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6
nv: parse sizes of ctx buffers ( #11161 )
2025-07-10 17:46:48 +03:00
Pyry Kovanen
32117402dd
metal: fix incorrect _free on interpreter exit ( #11158 )
2025-07-10 14:01:30 +03:00
George Hotz
53ae153404
tc should be in opt ( #11148 )
...
* tc should be in opt [pr]
* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d
initial gfx950 kfd support ( #11151 )
...
* feat: initial gfx950 support
* fix: lint
2025-07-09 13:45:16 -07:00
nimlgen
b6981404ed
memory: use page shifts in memory manager ( #11149 )
...
* memory: use page shifts in memory manager
* fix
2025-07-09 22:05:00 +03:00
George Hotz
22305260e0
move tc to tc.py [pr] ( #11147 )
2025-07-09 10:55:56 -07:00
nimlgen
43650169f4
nv: switch headers to 570.144 to match gsp ( #11131 )
2025-07-08 20:29:06 +03:00
nimlgen
b516fe71b4
nv: return real struct in _alloc_boot_struct ( #11130 )
2025-07-08 20:04:43 +03:00
qazal
3dfc0ff887
move cpu_profile and shared ProfileEvents from device.py to helpers [pr] ( #11126 )
...
* move cpu_profile and shared ProfileEvents to helpers [pr]
* TestProfiler.test_cpu_profile
* update test_viz.py
* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
nimlgen
71377cd233
nv: parse falcon app descs ( #11118 )
2025-07-07 18:14:14 +03:00
nimlgen
9a573a1d99
nv: finalize nvdev ( #11117 )
...
* nv: finalize nvdev
* typo
2025-07-07 16:31:59 +03:00
nimlgen
fa59c05282
nv: import flags from system ( #11115 )
...
* nv: import flags from system
* not used
2025-07-07 14:46:49 +03:00
nimlgen
b73e89110e
nv: align allocations for perf ( #11114 )
2025-07-06 22:32:11 +03:00
nimlgen
577afc9f05
hcq: remove redunt syncs and fix typing ( #11096 )
...
Before this patch the code could issues reduntdant syncs because of
the typing issue. Current tests should cover all correctness checks.
2025-07-04 21:49:47 +03:00
nimlgen
6656aa162c
nv: enable huge pages ( #11091 )
2025-07-04 17:17:24 +03:00
nimlgen
01f3c4f44d
memory: simpler paddr allocation logic ( #11090 )
...
* memory: new paddr allocation logic
* am fix
* am refactrros
* fix
* mypy
* use it
* am
2025-07-04 17:00:36 +03:00
nimlgen
e02ee8ef1b
nv: cleanups from 5090 ( #11081 )
2025-07-04 00:08:47 +03:00
nimlgen
2d138c6cf1
am: factor out init_sw ( #11070 )
2025-07-03 11:01:17 +03:00
quortus
17d85b9793
Refactor STORE implementation in ops_python ( #11060 )
2025-07-02 14:29:12 -07:00
nimlgen
6067568087
nv: remove hardcoded CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES ( #11057 )
2025-07-02 20:41:10 +03:00
Ahmed Harmouche
e992ed10dc
WebGPU on Windows ( #10890 )
...
* WebGPU on Windows
* Fix dawn-python install
* New test
* pydeps
* Minor fix
* Only install dawn-python on windows webgpu
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-02 08:38:45 -07:00
nimlgen
e67a6d2310
nv: tiny cleanups ( #11053 )
2025-07-02 18:37:32 +03:00
b1tg
fcbefde8f5
fix DiskDevice reuse ( #11039 )
...
* fix DiskDevice reuse
* fix mypy and DiskDevice.count
* mypy
* add test
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-07-01 10:29:21 -04:00
nimlgen
9ea7deb515
hcq: select_iface shared ( #11033 )
...
* hcq: select_iface shared
* errs
* sorry
* upprt
2025-06-30 21:12:39 +03:00
chenyu
126fcf4129
clean up AMD_LLVM in tests ( #11021 )
2025-06-28 22:45:47 -04:00
nimlgen
e53673a0b2
amd: sdma queue overrun fix ( #11012 )
...
* amd: sdma queue overrun fix
* add ()
* fix
* bug
* this is correct
2025-06-28 01:42:03 +03:00
George Hotz
be53ef4f0a
rename DEFINE_ACC -> DEFINE_REG ( #11006 )
...
* rename DEFINE_ACC -> DEFINE_REG
* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
nimlgen
1c45b9f7fb
start nvpci ( #10521 )
...
* start nvpci
* talk to fsp
* boot args
* riscv core bootted
* q
* agen
* got gsp init msg
* some fixes
* set registry, stuck aft lockdown(
* start ga/ad port
* gsp init on ada
* more classes allocated
* more
* mm
* fixes and progress
* no huge pages for now
* mm seems workin, but switch to 512mb page for simplicity
* working state
* not cleaned
* claned
* nvd=1
* start gr ctx
* compute
* clean 1
* cleanup 2
* cleanup 3
* cleaner 4
* cleaner 6
* add iface to nv
* save before reboot
* merged into NV
* moveout mm
* post merge
* cleaner 7
* merge and rebase
* pciiface abstraction + reset
* download fw from web
* print logs
* minor changes + p2p
* cleaner 8
* cleaner 9
* cleaner 10
* delete
* delete this as well
* linter 1
* oops
* priv_client -> priv_root
* fix mypy
* mypy?
* mypy?
* small changes
* shorter
* ops
* remove this
* do not allocate paddr for reserve
* nodiff
* unified script
* ops
* dif ver
* add lock
* setup
2025-06-25 00:37:34 +03:00
uuuvn
c8d0f68763
Weaker renderer validation in remote ( #10964 )
...
```
training bert
training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5']
Traceback (most recent call last):
File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module>
with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert
for x in GPUS: Device[x]
~~~~~~^^^
File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__
def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item
ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__
if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ())
```
2025-06-24 14:15:09 -07:00
nimlgen
26ddf8d714
amd: rename dev_iface -> iface to match nv ( #10959 )
2025-06-24 20:22:19 +03:00
nimlgen
c0d9cf09e0
system: flock ( #10949 )
...
* system: flock
* imports
* xx
2025-06-24 11:33:49 +03:00
nimlgen
5202970feb
system: move memory_barrier to System ( #10948 )
...
* system: move memory_barrier to System
* fixed
2025-06-24 11:09:43 +03:00
George Hotz
0f89660ce4
Revert "change clang -march flag to -mcpu on arm ( #10841 )" ( #10942 )
...
This reverts commit 897e42fd1b .
2025-06-23 16:48:28 -07:00
ttomsa
897e42fd1b
change clang -march flag to -mcpu on arm ( #10841 )
...
* change clang -march flag to -mcpu with fp16 disassembly test
* fix
* add capstone to macos dependencies
* just check no cast in test
* rm import
* woops
* lets check
* move check
* llvm init before cpu chcek
* try this
* bump autogen llvm version
* also update libclang?
* revert
* add comment
* skip llvm test and add comment
* linter
2025-06-23 16:28:48 -07:00
uuuvn
4e2c9e36c7
Remote multihost (p2p transfer) ( #10601 )
2025-06-23 11:47:29 -07:00
nimlgen
eceb7a00d2
nv: rename iface mem functions ( #10931 )
2025-06-23 16:34:51 +03:00
nimlgen
3ccdb2356b
system: factor out PCIIfaceBase ( #10917 )
...
* system: factor out PCIIfaceBase
* linter
* typing
2025-06-22 20:03:14 +03:00
nimlgen
36536ef6f0
nv: minor changes from nvpci ( #10918 )
2025-06-22 18:04:39 +03:00
nimlgen
0e7bd9fd03
factor out generic MemoryManager ( #10910 )
...
* allocator -> memory
* just moveout it
* mm is abstracted
* need entry abstraction
* fix
* mypy
2025-06-21 16:18:33 +03:00
nimlgen
bb0299b9e5
system: shared pci logic ( #10894 )
...
* moveout pci logic
* fixes
* oops
* types
* more type
* one style
* thi is imp
2025-06-21 00:09:49 +03:00