Popular repositories Loading
-
-
-
DPO_controlled_sentiment_generation
DPO_controlled_sentiment_generation PublicImplementation of the the research paper DPO: your language model is secretly a reward model
Python
-
DPO_tldr_summarisation
DPO_tldr_summarisation Public2nd experiment of the paper "DPO: Your Language Model is Secretly a Reward Model"
-
RLHF_Sentiment_Alignment
RLHF_Sentiment_Alignment PublicRLHF implementation of sentiment alignments of DPO paper
Python
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
