Written by Mustafa Eyceoz. If you are not redirected, click here to read th...
Blog
Post Training Methods Language Models
Post-training adapts language models for specific, safe, and practical uses. This overview highlights key methods and the open source training_hub library.
Getting Reasoning Models Enterprise Ready
Customize reasoning models with synthetic data generation for enterprise deployment. Learn techniques from Red Hat's AI Innovation Team.
Beyond tokens per second: Unlocking smarter enterprise AI with inference-time scaling
Discover inference-time scaling techniques that improve AI quality and reliability for enterprise applications beyond just speed optimization.
Async-GRPO - Open, Fast, and Performant
Introducing Async-GRPO - an open-source library for scalable reinforcement learning with 42% efficiency gains over VERL and 11x over TRL for GRPO training.
Sculpting Subspaces: How We Solved Continual Learning in Large Language Models
Learn how our adaptive SVD method enables continual learning in LLMs with near-zero catastrophic forgetting, achieving 7% higher accuracy than baselines.
Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)
Understanding the distinction between reasoning and inference-time scaling in LLMs - insights from our R1 reproduction experiments.
Update 2 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)
Second update on R1 reasoning research - new results on training small LLMs with synthetic reasoning data and particle filtering methods.
Update 1 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)
First update on R1-like reasoning experiments - Granite models show significant gains with particle filtering and new data quality experiments.