tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-14 00:45:16 +08:00

Author	SHA1	Message	Date
George Hotz	a9b6cfece0	refactor llm into files (#15780 ) * refactor llm into files * chat.html * tokenizer cleanup * cleanup * tests	2026-04-17 12:33:11 +08:00
George Hotz	ec00cefa5b	llm is the only app (#15779 ) * tinygrad/llm is the only app * upd pyproject * claude refs * scoping * min diff	2026-04-17 10:44:48 +08:00
George Hotz	1ae6528bb6	move schedule into schedule (#15736 ) * move schedule into schedule * callify to root * sched docs	2026-04-15 11:03:25 +08:00
George Hotz	b5a9465b13	llm: add support for moonlight (deepseek MLA) (#15466 ) * add gguf Q5_0 * it works * rebase * simpler test * class * less diff * dicts * normal names * simplify * this * simpler * work * work	2026-04-11 10:32:48 +08:00
George Hotz	9092f2a8c0	llm: add shared_expert and rope_dim support from qwen35 (#15673 ) * llm: add shared_expert and rope_dim support from qwen35 * refactor into FFNBlock and TransformerBlock * norms where they belong	2026-04-10 19:18:27 +08:00
nimlgen	5181c8e23a	llm: fix nan in kvcache (#15552 )	2026-04-01 00:38:45 +03:00
George Hotz	fe2690399b	llm: support assistant prefill + refactor to TransformerConfig (#15457 ) * llm: support assistant prefill * refactor to ModelConfig * TransformerConfig * more	2026-03-25 10:50:48 +08:00
George Hotz	a33ac869aa	llm server: temperature + test client (#15444 ) * improvements to the llm server * eval script * eval llm * better eval gets 58.71 * cleanups * add temperature, but multinomial is absurdly slow * claude is so smart * lint * remove slop * no more stop	2026-03-24 21:07:15 +08:00
b1tg	891a73befc	llm: fix chunked prefill (#15182 ) * llm: fix chunked prefill * less lines --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2026-03-07 22:08:31 +08:00
George Hotz	e97922a57c	LLM speedup with two jits, prefill/rollout (#15153 ) * START_TIME * print cleanup * fix tests	2026-03-05 16:21:09 +08:00
George Hotz	ac1847cbf7	fully symbolic llm (#15097 ) * work * llm symbolic (almost) * work * revert that * llm sym * works * cleanups * cache tokens with the kv cache * cleanups * cleanups	2026-03-05 10:22:11 +08:00
George Hotz	d59e6e7a37	move more tests to test/null, split some existing ones (#14512 ) * move more tests to test/null, split some existing ones * null work * null work * move more * fixes * move PIL * PIL in CLIP * don't move that	2026-02-03 20:20:20 +08:00
chenyu	6279ae4a94	remove llm generate always reset start_pos (#14276 ) * remove llm generate always reset start_pos by itself seems like a bug, also added a test to repro forward_jit.reset() issue * issue is jit graph, so revert that test	2026-01-21 16:54:30 -05:00
George Hotz	5e24643889	minor import speedups (#14244 ) * minor import speedups * server stuff in server places * pre-commit * fix	2026-01-20 15:05:36 +09:00
George Hotz	6439a515be	test fixups / speedups / var_vals refactor (#13812 ) * no PYTHONPATH + llm server port 0 * llm tok speedup * refactor var_vals	2025-12-23 12:05:59 -05:00
George Hotz	321ab943b2	qwen model is working (#13690 ) * qwen model is mostly working * add Q4_K quantization support to GGUF parser, add qwen3:1.7b model - Add Q4_K (type 12) dequantization in nn/state.py - Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0) - Make bos_token_id optional for models like Qwen3 that don't have it - Fix line length issues and add preset parameter to SimpleTokenizer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * smaller diff * test dequant * half split * better * simple tok * mock token * polish * better * fix * replace --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 18:00:34 -04:00
George Hotz	316da9f7ff	llm: add created/model fields, non-streaming support, and tests (#13660 ) * llm: add created/model fields, non-streaming support, and tests - Add `created` timestamp and `model` fields to response (required by OpenAI spec) - Add non-streaming mode support for /v1/chat/completions - Add `send_data` helper to HTTPRequestHandler for responses with Content-Length - Refactor viz/serve.py to use send_data - Add integration tests using real OpenAI client 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * add openai to testing * toml * Remove 'openai' from dependencies Removed 'openai' from the dependencies list. * bump cache --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 14:50:36 -05:00

17 Commits