Commit Graph

4997 Commits

Author SHA1 Message Date
hikettei
0eee08b87c [experiment] topological sort when doing _recursive_group (i dunno if this is good but at least it works.) 2024-07-05 21:26:18 +09:00
hikettei
29bf027f87 [refactor] improved the code consistency of payne hanek 2024-07-05 21:06:55 +09:00
hikettei
08633ea366 [refactor] more readable payne hanek impl 2024-07-05 18:32:04 +09:00
hikettei
77f7b4d93a [refactor] refactored some rebundant parts existing in payne hanek 2024-07-05 18:02:55 +09:00
hikettei
49f41b6a75 some minor simplification to payne hanek reduction 2024-07-05 17:42:43 +09:00
hikettei
75c635c78f xsin is reluctant to call payne_hanek_reduction which is slow to compile, passing stable diffusion compilation in a realistic time 2024-07-05 16:04:00 +09:00
hikettei
66a9752b71 [Update] isNaN(x) Free log2 algorithm, passing PTX tests, METAL with fastmath enabled is able to handle nan well, amd backend will not crash. 2024-07-04 16:34:59 +09:00
hikettei
074a93d2d5 Merge branch 'master' into faster-approx-fix-cycle-graph 2024-07-04 13:14:14 +09:00
Tobias Fischer
0c3a35e5c2 Stable Diffusion v2 Inference (#5283)
* model implementation

* clip fix, more qol options
2024-07-03 22:47:10 -04:00
chenyu
e5ba385f03 remove first contiguous in multi from_sharded (#5121)
second contiguous guarantees lbs are contiguous going into MultiLazyBuffer, don't need the first contiguous
2024-07-03 19:42:56 -04:00
hikettei
0bad0ec480 Merge branch 'master' into faster-approx-fix-cycle-graph 2024-07-04 08:29:23 +09:00
chenyu
f1ff65e763 remove "no-nans-fp-math"="true" for LLVM (#5282)
fixed isnan for llvm (still have issue with < nan)
2024-07-03 17:52:50 -04:00
chenyu
3929a9dc94 fix UOp.cmp_tuple for ALU (#5280)
* fix UOp.cmp_tuple for ALU

for ALU, use self.arg instead of self.op to compare

* skip that?
2024-07-03 14:59:05 -04:00
qazal
a9d6a6c339 verify_lazyop with multi reduce (#5276)
* outsource the assert to the implicit movement op check

* tests
2024-07-03 20:15:42 +03:00
George Hotz
16e3b8b013 uops work from lowerer [run_process_replay] (#5279) 2024-07-03 09:40:00 -07:00
chenyu
622b7bd556 simpler TinyJit inside TinyJit detection (#5219)
* simpler TinyJit inside TinyJit detection

suggested in 73395b998b (commitcomment-143660402)

* cannot repro...

* clear the way out

* finally clear
2024-07-03 12:28:53 -04:00
gip
04ef0fd328 fix: message when applegpu tools missiong (#5236) 2024-07-03 09:07:09 -07:00
reddyn12
d3e244d8b7 prev speed improvements (#5252)
Co-authored-by: reddyn <nikidsniper@gmail.com>
2024-07-03 09:06:01 -07:00
hikettei
e99204740a Merge branch 'master' into faster-approx-fix-cycle-graph 2024-07-03 21:14:47 +09:00
hikettei
926738bd3d Merge branch 'faster-approx-fix-cycle-graph' of github.com:hikettei/tinygrad into faster-approx-fix-cycle-graph 2024-07-03 20:58:16 +09:00
hikettei
6e16609961 [Patch] Creating a mask for exp2 using x <= Inf satisfies True as long as x is a real value 2024-07-03 20:57:31 +09:00
nimlgen
21d41f06a2 nv follows HCQCompatAllocRes protocol (#5275)
* nv follows HCQCompatAllocRes protocol

* fix amd
2024-07-03 11:34:10 +03:00
Vyacheslav Pachkov
d3e4e21759 add return type for HCQCompatAllocator _alloc (#5267)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-07-03 10:25:44 +03:00
chenyu
191463a919 add timing to SDXL (#5273) 2024-07-02 23:29:54 -04:00
chenyu
b2c3a28a5e nn.RMSNorm (#5272)
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
chenyu
9a2a82a77f test stable diffusion unet in ci (#5268)
unet is parameterized now so can test a smaller one is ci
2024-07-02 21:37:52 -04:00
chenyu
ce52b10f6f add a flag DISABLE_LOOP_COLLAPSE (#5270)
workaround if user encountered UNMUL error
2024-07-02 20:01:11 -04:00
George Hotz
e53b164e1a small changes from lowerer (#5266) 2024-07-02 15:03:54 -07:00
nimlgen
7be776f9af add _alloc_signal/_free_signal to hcq (#5264)
* add _alloc_signal/_free_signal api

* oops, revert this

* linter
2024-07-02 23:35:39 +03:00
Tobias Fischer
9a25ee0b9a pixed unet call params (#5262) 2024-07-02 12:40:27 -04:00
hikettei
9fb81b8a29 Merge branch 'master' into faster-approx-fix-cycle-graph 2024-07-02 22:52:17 +09:00
qazal
59bc837ad1 refactor gated load rendering [run_process_replay] (#5259)
* refactor gated load rendering [run_process_replay]

* hotfix: extra line

* remove llvm diff
2024-07-02 15:13:10 +03:00
nimlgen
e050603b4b nv close fds after mapping (#5246) 2024-07-02 13:57:46 +03:00
qazal
d3cfb6c2e3 refactor UOps.LOAD barrier [run_process_replay] (#5258) 2024-07-02 13:48:47 +03:00
hikettei
e0de4e0897 Merge branch 'faster-approx-fix-cycle-graph' of github.com:hikettei/tinygrad into faster-approx-fix-cycle-graph 2024-07-02 18:56:47 +09:00
hikettei
d7f15c0fc4 updated the count of constant folding 2024-07-02 18:51:29 +09:00
hikettei
362b5b73b9 Merge branch 'master' into faster-approx-fix-cycle-graph 2024-07-02 18:38:08 +09:00
hikettei
88f2072f3b [update] force to use bitcast 2024-07-02 18:33:59 +09:00
qazal
a1044e6063 iterate over scoped uops once [run_process_replay] (#5255) 2024-07-02 09:21:09 +03:00
wozeparrot
dfbee4f0f5 feat: add blobfile to testing (#5254) 2024-07-01 19:33:58 -07:00
Tobias Fischer
8c9c1cf62f Pulled CLIP and UNet into Seperate Files (#5253)
* pulled clip and unet into seperate files

* reference cleanup, lru cache fix

* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu
5808c37302 hotfix disable flaky llama3 beam benchmark on green (#5249) 2024-07-01 15:00:47 -04:00
chenyu
b9122ecdaf revert stable diffusion validation with threefry (#5248)
* Revert "use threefry in stable diffusion benchmark (#4988)"

This reverts commit 44dfa37c70.

* sdxl and validation fix

* relax threshold
2024-07-01 14:43:47 -04:00
nimlgen
57e89645cd hcq spec test (#5226)
* start hcq spec test

* more test

* fixes

* run on amd as well

* test amdgpu exec

* fix amd

* amd mockgpu support sdma timestamp
2024-07-01 17:36:37 +03:00
Carson Powers
d7839fdc5f Add x!=0 -> (bool)x pattern [run_process_replay] [no_assert] (#5237)
* x!=0 -> (bool)x pattern

* bool != bool pattern

* redundant upat
2024-06-30 17:48:45 -07:00
George Hotz
14980f79dd hotfix: unbreak llama 2024-06-30 15:27:54 -07:00
George Hotz
146eb3a811 hotfix: add repeat_interleave docs 2024-06-30 15:25:18 -07:00
George Hotz
3df47bc21e OpenELM + repeat_interleave (#5234)
* start writing openelm

* progress...hit bug

* repeat_interleave support

* gqa

* add rotary embedding

* spp

* i think it runs correctly

* broken

* output is good now

* cleanups

* no io_uring on android
2024-06-30 15:18:39 -07:00
nimlgen
7b7b751513 simple hip backend for debugging (#5201)
* hip backend

* fix mypy

* shorter

* fixes

* tiny changes
2024-06-30 23:00:11 +03:00
chenyu
88763eb9ff fix stable_diffusion with fp16 (#5239) 2024-06-30 12:59:31 -04:00