* add torch inplace tests
* first set of tests passing
* wrap all inplace funcs, add more tests
* fixes and wrap more functions
* fix all uint8 tests to avoid slow tests
* fix the one test
* another test, another fix
* and one more, works for ddp now
* something on contiguous, cleanup
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>