tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-11 23:46:02 +08:00

Author	SHA1	Message	Date
Nick	af94bfc401	fix retinanet shared memory race condition in parallel tests (#15030 ) Append PID to shared memory names in batch_load_retinanet to prevent FileExistsError when pytest-xdist runs multiple test workers that each call _setup_shared_mem with the same hardcoded name.	2026-02-27 08:36:24 +08:00
George Hotz	55d3a5def9	preallocate all realized buffers (#14823 ) * preallocate all realized buffers * contiguous * work * comment that out * move to schedule * better * correct fix * just buffer * disk bufs * fixes disk tensor stuff * fix symbolic stuff * fix multi * 162 failures * bugfixes * don't check that anymore * fix schedule tests * mnist should be contiguious * type and buffer * fix tests * shrink axis correction * mypy fixes * tests skips * same 37 failures * dedup * no shrink in the graph * 29 failures * skips * fix custom kernel * fix training * those optimizations aren't supported currently * simpler * more correct * tests * 14 failures * works * fix that test * broken * 11 failures * only kernel counts left * fixes * all tests pass * remove tensor_map * op test * 200 -> 230 * test fixes * fixes * revert test_tiny thing * guard * revert that * test tiny passes * no contigs there * base realize back * Revert "no contigs there" This reverts commit `c45bb9fcfd`. * revert that * chop many assigns * 12 failures * fix tests * tests * apply after * pre-commit * remove old code * delete that * fix types * remove extra contig * fix dataloader * torch fix * disk fix * update kernel fusion numbres * runs on amd * restore kernel count * add that rule back * that * disable that * wrong * add the correct rule for that folding * more tests * guard c1.arg * no newlines * realize those * split into a different file * remove detach/contig back * skip 2 * update that	2026-02-20 20:05:54 +08:00
George Hotz	fc5677c28b	resnet dataloader + more test cleanups (#14899 ) * resnet dataloader * tests	2026-02-20 10:05:47 +08:00
wozeparrot	a60220bed9	llama3: move dl to numpy & jit more (#14677 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-02-10 18:16:40 -08:00
chenyu	d57d24c7d4	Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535 ) it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data	2026-02-04 11:31:11 -05:00
b1tg	241f0402b4	add seed in bert data shuffle (#14054 )	2026-01-07 10:02:05 -05:00
chenyu	da1cb6a9ec	update llama dataloader (#13825 ) separate creating dataset from itererating over the dataset to not create eval data for each eval	2025-12-24 17:42:08 -05:00
hooved	1e8945a28c	Training loop for Stable Diffusion mlperf (#12315 ) * add diff * fix edit error * match master * point reference to specific commit * simplify wandb logging * remove lr test, dehardcode device * increase stack size limit	2025-10-03 02:45:38 -04:00
wozeparrot	7e68045fb2	feat: small llama3 training (#11829 )	2025-08-31 13:41:47 -07:00
wozeparrot	7ae4335127	feat: generate blend index (#11566 )	2025-08-07 14:20:28 -04:00
wozeparrot	2d5bdc939d	faster llama3 dataloader (#11540 )	2025-08-06 18:25:57 -04:00
chenyu	f7965f85aa	Revert "feat: faster index building (#11462 )" (#11478 ) This reverts commit `3a4deb08d2`.	2025-08-02 12:50:48 -04:00
wozeparrot	3a4deb08d2	feat: faster index building (#11462 ) * feat: faster index building * feat: correct training samples	2025-08-02 11:50:18 -04:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
George Hotz	bfc30fa6ea	hotfix: typo in shm_name	2025-05-14 19:34:52 -07:00
George Hotz	2bc54b3e22	manually handle OSX	2025-05-14 19:17:51 -07:00
George Hotz	ab460486d7	Revert "resnet dataloader osx (#10316 )" This reverts commit `aef336930a`.	2025-05-14 19:15:07 -07:00
George Hotz	aef336930a	resnet dataloader osx (#10316 ) * mlperf dataloader on mac * resnet dataloader [pr] * simple should work	2025-05-14 18:31:26 -07:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
qazal	845814f396	revert buffer_view change (#9311 ) * Revert "BUFFER_VIEW is a node in the kernel graph + delete ViewOp (#9298)" This reverts commit `3210b656b6`. * Revert "substitute ast from kernel op [pr] (#9293)" This reverts commit `5a9c788ae6`.	2025-03-01 11:00:12 +01:00
qazal	3210b656b6	BUFFER_VIEW is a node in the kernel graph + delete ViewOp (#9298 )	2025-02-28 12:15:04 +02:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
chenyu	975c318dbc	bert use int32 for input ids (#9173 ) original data was int32 for these. float might have caused precision issues	2025-02-19 08:17:27 -05:00
chenyu	ff05bff221	put bert data shard inside jit (#9160 ) python time 45ms -> 9ms, it was spending time to schedule the shard also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS	2025-02-18 10:36:54 -05:00
chenyu	994944920b	simpler batch_load_train_bert [pr] (#8582 ) don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step. https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview	2025-01-12 20:25:05 -05:00
qazal	9828277c03	view doesn't have buffer, fix the tests [pr] (#7841 ) * view doesn't have buffer, fix the tests [pr] * need assigns	2024-11-22 20:41:55 +08:00
chenyu	1dab75ae37	clean up mlperf dataloader import (#5940 ) use tinygrad tqdm for dataset, and PIL Image is only needed for resnet	2024-08-06 17:10:08 -04:00
Elias Wahl	4a114756f6	New BERT dataloader (#5881 ) * One file == One topic * update test * new dataloader * update train script * get index is faster	2024-08-02 15:12:23 -04:00
Francis Lata	a0baff7a3d	update dataloader script example (#5818 )	2024-07-30 15:18:29 -04:00
Francis Lata	0345577032	UNet3D dataloader shared memory fix (#5465 ) * create separate SharedMemory between inputs and labels * update path check for shared mem * clean up unit test for dataset	2024-07-13 20:26:00 -04:00
Francis Lata	707099487a	Multiprocessing UNet3D dataloader (#4801 ) * testing dataloader * matching dataloader implementation for unet3d * remove comments * clean up dataloader * add cookie and cleanup * use shm_path when creating SharedMemory * add support for testing resnet and unet3d dataloaders * update dataset test to return preprocesed data directory in prep for dataloader testing * pass preprocessed dataset directory properly * update loader function for dataloader * add shuffling on indices * update shm name * more cleanup for unet3d dataloader * remove changes to tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-02 11:30:47 -04:00
Elias Wahl	acc0039cfc	Resume fix + scheduler for non weight decay params (#4679 ) * move ckpt dir * fix resume. Add scheduler group	2024-05-21 19:38:13 -04:00
chenyu	2c3b7f8e70	pad resnet training data with training data mean (#4369 ) update model_train resnet to pad training	2024-05-02 20:26:15 -04:00
chenyu	bf31837e6d	resnet correct steps_in_val_epoch in logging (#4389 ) also added random seed from system in scripts	2024-05-02 10:51:36 -04:00
chenyu	6628e13a5f	pad resnet eval data in model_train (#4374 ) asserted if eval sample count is different from total eval file count.	2024-05-01 14:33:42 -04:00
chenyu	683b7c605a	pad first batch of imagenet dataloader and update eval (#4368 ) * pad first batch of imagenet dataloader and update eval * pad zero instead of empty for training	2024-05-01 00:21:52 -04:00
Elias Wahl	27613dd881	MLPerf BERT: Main training loop (#4288 ) * BERT language modeling head + trunc normal initializers * add train loop + helpers * shuffle in dataloaders + slight changes in main loop * beam change * Minor changes * random.shuffle * HParam update * Use deque for dataloader * wandb bert project name * half fixes * BENCHMARK + remove epoch * cast + print() --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-29 14:35:27 -04:00
Elias Wahl	3a48773f1a	BERT dataloader (#4252 ) * add dataloader * comment	2024-04-23 13:44:49 -04:00
David Hou	199f7c4342	MLPerf Resnet (cleaned up) (#3573 ) * this is a lot of stuff TEST_TRAIN env for less data don't diskcache get_train_files debug message no lr_scaler for fp32 comment, typo type stuff don't destructure proc make batchnorm parameters float make batchnorm parameters float resnet18, checkpointing hack up checkpointing to keep the names in there oops wandb_resume lower lr eval/ckpt use e+1 lars report top_1_acc some wandb stuff split fw and bw steps to save memory oops save model when reach target formatting make sgd hparams consistent just always write the cats tag... pass X and Y into backward_step to trigger input replace shuffle eval set to fix batchnorm eval dataset is sorted by class, so the means and variances are all wrong small cleanup hack restore only one copy of each tensor do bufs from lin after cache check (lru should handle it fine) record epoch in wandb more digits for topk in eval more env vars small cleanup cleanup hack tricks cleanup hack tricks don't save ckpt for testeval cleanup diskcache train file glob clean up a little device_str SCE into tensor small small log_softmax out of resnet.py oops hack :( comments HeNormal, track gradient norm oops log SYNCBN to wandb real truncnorm less samples for truncated normal custom init for Linear log layer stats small Revert "small" This reverts commit `988f4c1cf3`. Revert "log layer stats" This reverts commit `9d98224585`. rename BNSYNC to SYNCBN to be consistent with cifar optional TRACK_NORMS fix label smoothing :/ lars skip list only weight decay if not in skip list comment default 0 TRACK_NORMS don't allocate beam scratch buffers if in cache clean up data pipeline, unsplit train/test, put back a hack remove print run test_indexing on remu (#3404) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`. fix bad seeding UnsyncBatchNorm2d but with synced trainable weights label downsample batchnorm in Bottleneck :/ :/ i mean... it runs... its hits the acc... its fast... new unsyncbatchnorm for resnet small fix don't do assign buffer reuse for axis change * remove changes * remove changes * move LARS out of tinygrad/ * rand_truncn rename * whitespace * stray whitespace * no more gnorms * delete some dataloading stuff * remove comment * clean up train script * small comments * move checkpointing stuff to mlperf helpers * if WANDB * small comments * remove whitespace change * new unsynced bn * clean up prints / loop vars * whitespace * undo nn changes * clean up loops * rearrange getenvs * cpu_count() * PolynomialLR whitespace * move he_normal out * cap warmup in polylr * rearrange wandb log * realize both x and y in data_get * use double quotes * combine prints in ckpts resume * take UBN from cifar * running_var * whitespace * whitespace * typo * if instead of ternary for resnet downsample * clean up dataloader cleanup a little? * separate rng for shuffle * clean up imports in model_train * clean up imports * don't realize copyin in data_get * remove TESTEVAL (train dataloader didn't get freed every loop) * adjust wandb_config entries a little * clean up wandb config dict * reduce lines * whitespace * shorter lines * put shm unlink back, but it doesn't seem to do anything * don't pass seed per task * monkeypatch batchnorm * the reseed was wrong * add epoch number to desc * don't unsyncedbatchnorm is syncbn=1 * put back downsample name * eval every epoch * Revert "the reseed was wrong" This reverts commit 3440a07dff3f40e8a8d156ca3f1938558a59249f. * cast lr in onecycle * support fp16 * cut off kernel if expand after reduce * test polynomial lr * move polynomiallr to examples/mlperf * working PolynomialDecayWithWarmup + tests....... add lars_util.py, oops * keep lars_util.py as intact as possible, simplify our interface * no more half * polylr and lars were merged * undo search change * override Linear init * remove half stuff from model_train * update scheduler init with new args * don't divide by input mean * mistake in resnet.py * restore whitespace in resnet.py * add test_data_parallel_resnet_train_step * move initializers out of resnet.py * unused imports * log_softmax to model output in test to fix precision flakiness * log_softmax to model output in test to fix precision flakiness * oops, don't realize here * is None * realize initializations in order for determinism * BENCHMARK flag for number of steps * add resnet to bechmark.yml * return instead of break * missing return * cpu_count, rearrange benchmark.yml * unused variable * disable tqdm if BENCHMARK * getenv WARMUP_EPOCHS * unlink disktensor shm file if exists * terminate instead of join * properly shut down queues * use hip in benchmark for now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-14 00:53:41 -04:00
chenyu	3d9b882d37	hotfix unlink /dev/shm/resnet_X if it already exists (#3726 )	2024-03-13 18:53:03 -04:00
David Hou	2befdf86d9	dataloader worker/shm cleanup (#3710 )	2024-03-12 21:44:24 -04:00
George Hotz	9cc2577a08	use hip events (#3157 ) * use hip events * cleanup	2024-01-17 10:39:57 -08:00
George Hotz	a464909d79	fast resnet eval (#3135 ) * fast resnet eval * fix HIP multidevice graph * neater expression for devices * lines * add decorator test	2024-01-15 14:15:18 -08:00

45 Commits