tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-13 00:15:35 +08:00

Go to file

Skosh 78aa147b39 [WIP] YOLO working on tinygrad! (#245 )

* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…

2021-04-25 18:06:52 -07:00

.github/workflows

minor cleanups, yolo work

2021-01-03 08:14:16 -08:00

ane

add an aneccompile example in Objective-C (#240 )

2021-01-31 08:31:16 -08:00

docs

[WIP] YOLO working on tinygrad! (#245 )

2021-04-25 18:06:52 -07:00

examples

[WIP] YOLO working on tinygrad! (#245 )

2021-04-25 18:06:52 -07:00

extra

remove print

2021-01-02 12:53:30 -08:00

test

[WIP] YOLO working on tinygrad! (#245 )

2021-04-25 18:06:52 -07:00

tinygrad

[WIP] YOLO working on tinygrad! (#245 )

2021-04-25 18:06:52 -07:00

.gitignore

[WIP] YOLO working on tinygrad! (#245 )

2021-04-25 18:06:52 -07:00

LICENSE

readme

2020-10-18 11:27:37 -07:00

push_pypi.sh

push pypi

2020-10-27 08:13:15 -07:00

README.md

nah, no sign, it's not what you want. use relu

2021-01-03 09:30:33 -08:00

requirements.txt

fix pyopencl (#125 )

2020-11-19 19:03:04 -08:00

setup.py

All devices are equal! (#196 )

2020-12-15 23:44:08 -08:00

README.md

For something in between a pytorch and a karpathy/micrograd

This may not be the best deep learning framework, but it is a deep learning framework.

Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. Support the simple basic ops, and you get SOTA vision extra/efficientnet.py and language extra/transformer.py models. We are working on support for the Apple Neural Engine.

Eventually, we will build custom hardware for tinygrad, and it will be blindingly fast. Now, it is slow.

Installation

pip3 install git+https://github.com/geohot/tinygrad.git --upgrade

Example

from tinygrad.tensor import Tensor

x = Tensor.eye(3)
y = Tensor([[2.0,0,-2.0]])
z = y.matmul(x).sum()
z.backward()

print(x.grad)  # dz/dx
print(y.grad)  # dz/dy

Same example in torch

import torch

x = torch.eye(3, requires_grad=True)
y = torch.tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()

print(x.grad)  # dz/dx
print(y.grad)  # dz/dy

Neural networks?

It turns out, a decent autograd tensor library is 90% of what you need for neural networks. Add an optimizer (SGD, RMSprop, and Adam implemented) from tinygrad.optim, write some boilerplate minibatching code, and you have all you need.

Neural network example (from test/test_mnist.py)

from tinygrad.tensor import Tensor
import tinygrad.optim as optim

class TinyBobNet:
  def __init__(self):
    self.l1 = Tensor.uniform(784, 128)
    self.l2 = Tensor.uniform(128, 10)

  def forward(self, x):
    return x.dot(self.l1).relu().dot(self.l2).logsoftmax()

model = TinyBobNet()
optim = optim.SGD([model.l1, model.l2], lr=0.001)

# ... and complete like pytorch, with (x,y) data

out = model.forward(x)
loss = out.mul(y).mean()
optim.zero_grad()
loss.backward()
optim.step()

GPU and Accelerator Support

tinygrad supports GPUs through PyOpenCL.

from tinygrad.tensor import Tensor
(Tensor.ones(4,4).gpu() + Tensor.ones(4,4).gpu()).cpu()

ANE Support?!

If all you want to do is ReLU, you are in luck! You can do very fast ReLU (at least 30 MEGAReLUs/sec confirmed)

Requires your Python to be signed with ane/lib/sign_python.sh to add the com.apple.ane.iokit-user-access entitlement, which also requires amfi_get_out_of_my_way=0x1 in your boot-args. Build the library with ane/lib/build.sh

from tinygrad.tensor import Tensor

a = Tensor([-2,-1,0,1,2]).ane()
b = a.relu()
print(b.cpu())

Warning: do not rely on the ANE port. It segfaults sometimes. So if you were doing something important with tinygrad and wanted to use the ANE, you might have a bad time.

Adding an accelerator

You need to support 14 first class ops:

Relu, Log, Exp                  # unary ops
Sum, Max                        # reduce ops (with axis argument)
Add, Sub, Mul, Pow              # binary ops (with broadcasting)
Reshape, Transpose, Slice       # movement ops
Matmul, Conv2D                  # processing ops

While more ops may be added, I think this base is stable.

ImageNet inference

Despite being tiny, tinygrad supports the full EfficientNet. Pass in a picture to discover what it is.

ipython3 examples/efficientnet.py https://upload.wikimedia.org/wikipedia/commons/4/41/Chicken.jpg

Or, if you have a webcam and cv2 installed

ipython3 examples/efficientnet.py webcam

PROTIP: Set "GPU=1" environment variable if you want this to go faster.

PROPROTIP: Set "DEBUG=1" environment variable if you want to see why it's slow.

tinygrad also supports GANs

See examples/mnist_gan.py

The promise of small

tinygrad will always be below 1000 lines. If it isn't, we will revert commits until tinygrad becomes smaller.

Running tests

python3 -m pytest

TODO

Train an EfficientNet on ImageNet
Add a language model. BERT?
Add a detection model. EfficientDet?
Reduce code
Increase speed
Add features

Languages

Python 74.7%

C 14.5%

Cuda 5.9%

C++ 2.3%

Metal 1.7%

Other 0.7%