LLM Toolkit

Async-GRPO - Open, Fast, and Performant

Async-GRPO - Open, Fast, and Performant

Written by Aldo Pareja, Mustafa Eyceoz Introducing Async-GRPO With the incr...

Sculpting Subspaces: How We Solved Continual Learning in Large Language Models

Sculpting Subspaces: How We Solved Continual Learning in Large Language Models

Authors: Nikhil Shivakumar Nayak, Krishnateja Killamsetty, Ligong Han, Abhish...

Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Written by Akash Srivastava, Isha Puri, Kai Xu, Shivchander Sudalairaj, Musta...