Discussion about this post

User's avatar
Daniel Popescu / ⧉ Pluralisk's avatar

It's interesting how you framed the RLHF part. Really appreciate the deep dive into human feedback; it makes you think how critcal the initial policy choice is for these models.

No posts

Ready for more?