* improvements to the llm server
* eval script
* eval llm
* better eval gets 58.71
* cleanups
* add temperature, but multinomial is absurdly slow
* claude is so smart
* lint
* remove slop
* no more stop
* start LLM app, tons of clean up required. target is 200 line ollama
* kind of works
* simpler
* add k/v cache
* with SYM=1, it loops
* no rope cache
* simpler
* more cleanups
* cleanups
* works
* argparse and comments
* from gguf
* generate is a function
* no copy from cpu
* fix max context pass in
* test
* improve test
* ai2_arc
* fix 8B, use less ram
* 136 lines