* preallocate all realized buffers
* contiguous
* work
* comment that out
* move to schedule
* better
* correct fix
* just buffer
* disk bufs
* fixes disk tensor stuff
* fix symbolic stuff
* fix multi
* 162 failures
* bugfixes
* don't check that anymore
* fix schedule tests
* mnist should be contiguious
* type and buffer
* fix tests
* shrink axis correction
* mypy fixes
* tests skips
* same 37 failures
* dedup
* no shrink in the graph
* 29 failures
* skips
* fix custom kernel
* fix training
* those optimizations aren't supported currently
* simpler
* more correct
* tests
* 14 failures
* works
* fix that test
* broken
* 11 failures
* only kernel counts left
* fixes
* all tests pass
* remove tensor_map
* op test
* 200 -> 230
* test fixes
* fixes
* revert test_tiny thing
* guard
* revert that
* test tiny passes
* no contigs there
* base realize back
* Revert "no contigs there"
This reverts commit c45bb9fcfd.
* revert that
* chop many assigns
* 12 failures
* fix tests
* tests
* apply after
* pre-commit
* remove old code
* delete that
* fix types
* remove extra contig
* fix dataloader
* torch fix
* disk fix
* update kernel fusion numbres
* runs on amd
* restore kernel count
* add that rule back
* that
* disable that
* wrong
* add the correct rule for that folding
* more tests
* guard c1.arg
* no newlines
* realize those
* split into a different file
* remove detach/contig back
* skip 2
* update that
* update the backend to fix torch deprecation warning
* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients
* fix indentation
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend
* Add unit test for linalg.eigh function in TestTorchBackend
This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.
- Implemented a new function `equal` in the torch backend to compare two tensors for equality.
- Added unit tests for the `equal` function to verify its correctness with different tensor inputs.
* Enhance tensor random functions with dtype support
- Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py
- Added unit tests for uniform and normal tensor generation with specific dtypes in test.py
* Refactor test name for clarity
- Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py`
- Aims to improve readability and better reflect the test's purpose
* bug in div range folding
* simpler
* oh, this is right for indexing, but the div mod folding needs to be fixed
* reenable
* Passing test_complexity_w_unroll2 (#10068)
* Passing
* remove non_folded_divs
* Add check for negative tern in div folding
* Add test
* bump that limit
* fix casted
---------
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
* add kernelize
* remove that
* kernelize returns self
* update abstractions2.py
* kernelize in test_schedule
* temp: assert BUFFER_VIEW's existence
* ASSIGN must have a buffer or subbuffer target
* assert and shrink
* fix
* padded setitem
* var
* toposort once
* extra
* base_buffer
* end with BUFFER_VIEW
* setitem for disk
* test_setitem_becomes_subbuffer
* mul slice test
* torch backend fix 1
* non-deterministic
* keep subbuffer
* Add amax support to Tensor operations
- Implemented amax function in backend.py for tensor max operations.
- Added unit tests for amax in test.py to ensure correct functionality.
* Fix formatting in amax output function
- Adjusted spacing in the amax output lambda function in backend.py
- Improved code readability for better maintenance
* fix some tests in test_ops for torch backend(171 failing)
* fix more tests (135 failures)
* fix tests (126 failing)
* handle transposed convs (109 tests failing)
* fix slice
* fix lshift & rshift and more tests (87 tests failing)
* revert accidental change
* remove unnecessary changes (82 failures)
* fix backward for avg_pool2d (78 failures)
* fix backward for avg_pool2d (78 failures)
* fix replication backpass
* fix reflection pad back pass (71 failures)
* cummax with indicies, aten.mv and move out methods (67 failures)
* extract avg_pool2d and avg_pool3d to separate functions (62 failures)
* revert changes for cat_out
* rewrite avg_pool and pad without repetition
* remove duplicates from decomps
* slice rewrite and add slice_backward (59 failures)
* add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297
* fix linter error and remove Tensor.pad (48 failures)
* add select_backward and index_put (40 failures)
* fix some more tests (36 failures)
* fix more tests (12 failures)
* some cleanups and fix couple more tests (10 failures)
* cleaner way to write upsample
* some more upsample cleanups
* use lambda for upsample
* add autowrapper for upsample forward
* cumsum and max_dim without aten functions
* revert _log_softmax
* fix more tests (1 failure)
* make linter happy
* move import to appropriate func
* make linter happy
* add codes for noqa
* some more refactors
* remove comment
* remove dependency on aten function for conv backward
* some more refactors
* add returns
* revert a change from merge
* some cleanups
* remove whitespace
* remove ruff change
* revert upsample
* add masked_fill_.Tensor and scatter.src_out
* add todo
* fix test_biased_conv2d
* fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :(
* revert torch_debug
* revert torch_debug
* skip test_gather_failure for the tiny backend
* make padding registration more consise
* add nonzero
* remove scatter_add since we already have the out
* fix scatter
* remove some repetition
* make upsample backward registrations more concise
* remove select.int
* use Tensor.cumsum
* realize conv2d outputs before backward to fix test_biased_conv2d
* add a todo for realize(1 failure)
* add new_empty and new_empty_strided
* make test_pad_circular_mode forward only and remove redundant stuff
* fix linter errors
* remove expect failure
* just tb
* slice is a view_op
* contiguous only when lazydata.is_realized
* fix backward for test_pad_circular_mode
* revert torch.nn.functional.pad override
* add transpose.int and make constant_pad_nd contiguous
* slice_backwards has no kwargs
---------
Co-authored-by: chenyu <chenyu@fastmail.com>