The tinygrad framework has four pieces * a PyTorch like frontend. * a scheduler which breaks the compute into kernels. * a lowering engine which converts ASTs into code that can run on the accelerator. * an execution engine which can run that code. There is a good [bunch of tutorials](https://mesozoic-egg.github.io/tinygrad-notes/) by Di Zhu that go over tinygrad internals. There's also a [doc describing speed](../developer/speed.md) ## Frontend Everything in [Tensor](../tensor/index.md) is syntactic sugar around constructing a graph of [UOps](../developer/uop.md). The `UOp` graph specifies the compute in terms of low level tinygrad ops. Not all UOps will actually become realized. There's two types of UOps, base and view. base contains compute into a contiguous buffer, and view is a view. Inputs to a base can be either base or view, inputs to a view can only be a single base. ## Scheduling The [scheduler](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/schedule/__init__.py) converts the graph of UOps into a `LINEAR` UOp whose `src` is a list of `CALL` UOps. One `CALL` is one kernel on the GPU, and the scheduler is responsible for breaking the large compute graph into subgraphs that can fit in a kernel. The `CALL`'s `src[0]` (a `SINK` ast) specifies what compute to run, and the remaining `src` are the buffers to run it on. ## Lowering The code in [realize](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/engine/realize.py) lowers each `CALL` by compiling its ast into a `PROGRAM` and running it. ::: tinygrad.engine.realize.run_linear There's a ton of complexity hidden behind this, see the `codegen/` directory. First we lower the AST to UOps, which is a linear list of the compute to be run. This is where the BEAM search happens. Then we render the UOps into code with a `Renderer`, then we compile the code to binary with a `Compiler`. ## Execution `run_linear` walks the `LINEAR` UOp, dispatching each `CALL` to a runner (kernel, copy, view, encdec, or graph). ## Runtime Runtimes are responsible for device-specific interactions. They handle tasks such as initializing devices, allocating memory, loading/launching programs, and more. You can find more information about the runtimes API on the [runtime overview page](runtime.md). All runtime implementations can be found in the [runtime directory](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime). ### HCQ Compatible Runtimes HCQ API is a lower-level API for defining runtimes. Interaction with HCQ-compatible devices occurs at a lower level, with commands issued directly to hardware queues. Some examples of such backends are [NV](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/ops_nv.py) and [AMD](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/ops_amd.py), which are userspace drivers for NVIDIA and AMD devices respectively. You can find more information about the API on [HCQ overview page](hcq.md)