* bigmodel * more debug print * debugging bigmodel * remove the tanh, debugging * print images/buffers * disassemble the command queues * decompiler * dump the shaders * full disasm * support patching kernel and fixing convolution_horizontal_reduced_reads_1x1 * microbenchmark * 42 GFLOPS, 1 GB/s * gemm benchmark * 75 GFLOPS vs 42 GFLOPS * 115 GFLOPS * oops, never mind * gemm image is slow * this is pretty hopeless * gemm image gets 62 GFLOPS * this is addictive and still a waste of time * cleanup cleanup * that hook was dumb * tabbing * more tabbing Co-authored-by: Comma Device <device@comma.ai> old-commit-hash: 78a352a8ca8a948e86e7c752732e470f89d92280
4 lines
129 BLFS
C++
4 lines
129 BLFS
C++
version https://git-lfs.github.com/spec/v1
|
|
oid sha256:dc1020ef4bbdd7ae0fc3ea2bdf76478f82a1708c2838795f768b2c5a5fa86423
|
|
size 3019
|