Uncategorized

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning



"Large Language Models"In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *