News
This is a repository containing the implementation for Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment, which has been accepted to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results