Reasoning with long Chain-of-Thoughts

We are still in many ways trying to understand the way our own brains work. With the advancement of AI, a lot of experiments and thoughts are being centered around the way we can get models to ‘think,’ challenging themselves, sequentially questioning their output with long chains-of-thought (CoTs).

We are pleased to share a very interesting paper on this topic below, which sheds some light on how combining supervised fine-tuning (teaching wanted outputs from models) and reinforcement learning (letting the model try and learn with rewards) with varying reward functions can improve chain-of-thought reasoning.

Demistifying Long Chain-of-Thought Reasoning in LLMs

Source: Demystifying Long Chain-of-Thought Reasoning in LLMs paper

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top