* matrix strategy
* push env to GITHUB_ENV
* use printf instead of echo
* use temp helper function for cross os paths
* use path join
* switched to using temp helper function
* skip test on windows due to memory limit
* small fix
* removed semi
* touchups
* clean up
* seperate tests
* test changes to test_utils on windows
* small refactor
* more cleanups
* undo helpers change
* only skip if in CI and WINDOWS
* safetensors test
* safe_save
* load back with real safetensors
* bugfix in device name. add simple torch_load
* it works for llama, but it's slower...
* mmap
* no intermediate
* load mmaped
* readinto speed
* not ready yet
* revert that