tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-14 00:45:16 +08:00

Author	SHA1	Message	Date
George Hotz	ec00cefa5b	llm is the only app (#15779 ) * tinygrad/llm is the only app * upd pyproject * claude refs * scoping * min diff	2026-04-17 10:44:48 +08:00
George Hotz	f930579b7a	llm: change the default port to 8000 so you can remember it (match vLLM)	2026-04-08 11:25:38 +08:00
George Hotz	fe2690399b	llm: support assistant prefill + refactor to TransformerConfig (#15457 ) * llm: support assistant prefill * refactor to ModelConfig * TransformerConfig * more	2026-03-25 10:50:48 +08:00
George Hotz	a33ac869aa	llm server: temperature + test client (#15444 ) * improvements to the llm server * eval script * eval llm * better eval gets 58.71 * cleanups * add temperature, but multinomial is absurdly slow * claude is so smart * lint * remove slop * no more stop	2026-03-24 21:07:15 +08:00
leopf	4f0ee4e982	BPE tokenizer (#11415 ) * BPE works * refactor tok * oops * basic tests * fix eval * smaller diff * fix error * proper vocab decoding * use regex for splitting * escape ucatrange * full compat --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-04 09:52:38 -07:00
George Hotz	f7d4638e05	start LLM app, tons of clean up required. target is 200 line ollama (#11068 ) * start LLM app, tons of clean up required. target is 200 line ollama * kind of works * simpler * add k/v cache * with SYM=1, it loops * no rope cache * simpler * more cleanups * cleanups * works * argparse and comments * from gguf * generate is a function * no copy from cpu * fix max context pass in * test * improve test * ai2_arc * fix 8B, use less ram * 136 lines	2025-07-07 17:09:46 -07:00

6 Commits