inference-time-scaling

4 Articles
Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Understanding the distinction between reasoning and inference-time scaling in LLMs - insights from our R1 reproduction experiments.

Update 2 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 2 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Second update on R1 reasoning research - new results on training small LLMs with synthetic reasoning data and particle filtering methods.

Update 1 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 1 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

First update on R1-like reasoning experiments - Granite models show significant gains with particle filtering and new data quality experiments.

Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Learn how to reproduce R1-like reasoning in small LLMs using particle filtering, synthetic data, and GRPO - achieving GPT-4o accuracy with only 4 rollouts.

reasoning

4 Articles
Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 3 - On Reasoning vs Inference-time scaling - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Understanding the distinction between reasoning and inference-time scaling in LLMs - insights from our R1 reproduction experiments.

Update 2 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 2 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Second update on R1 reasoning research - new results on training small LLMs with synthetic reasoning data and particle filtering methods.

Update 1 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Update 1 - Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

First update on R1-like reasoning experiments - Granite models show significant gains with particle filtering and new data quality experiments.

Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Lessons on Reproducing R1-like Reasoning in Small LLMs without using DeepSeek-R1-Zero (or its derivatives)

Learn how to reproduce R1-like reasoning in small LLMs using particle filtering, synthetic data, and GRPO - achieving GPT-4o accuracy with only 4 rollouts.