Blog

Post Training Methods Language Models

Post Training Methods Language Models

Post-training adapts language models for specific, safe, and practical uses. This overview highlights key methods and the open source training_hub library.

Getting Reasoning Models Enterprise Ready

Getting Reasoning Models Enterprise Ready

Customize reasoning models with synthetic data generation for enterprise deployment. Learn techniques from Red Hat's AI Innovation Team.

Beyond tokens per second: Unlocking smarter enterprise AI with inference-time scaling

Beyond tokens per second: Unlocking smarter enterprise AI with inference-time scaling

Discover inference-time scaling techniques that improve AI quality and reliability for enterprise applications beyond just speed optimization.

Async-GRPO - Open, Fast, and Performant

Async-GRPO - Open, Fast, and Performant

Introducing Async-GRPO - an open-source library for scalable reinforcement learning with 42% efficiency gains over VERL and 11x over TRL for GRPO training.

Sculpting Subspaces: How We Solved Continual Learning in Large Language Models

Sculpting Subspaces: How We Solved Continual Learning in Large Language Models

Learn how our adaptive SVD method enables continual learning in LLMs with near-zero catastrophic forgetting, achieving 7% higher accuracy than baselines.

Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Understanding the distinction between reasoning and inference-time scaling in LLMs - insights from our R1 reproduction experiments.

Update 2 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 2 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Second update on R1 reasoning research - new results on training small LLMs with synthetic reasoning data and particle filtering methods.

Update 1 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 1 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

First update on R1-like reasoning experiments - Granite models show significant gains with particle filtering and new data quality experiments.

Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Learn how to reproduce R1-like reasoning in small LLMs using particle filtering, synthetic data, and GRPO - achieving GPT-4o accuracy with only 4 rollouts.