RLHF: Reinforcement Learning from Human Feedback | by Ms Aerin


Like everyone else, this is the first time I am experiencing closed research. Since I was in college, all frontier research has been open and peer-reviewed, until recently. And I believe openness ultimately advances science more than closedness.

If we aim to match the performance of ChatGPT through open source, I believe we need to start taking training data more seriously. A substantial part of ChatGPT’s effectiveness might not come from, say, specific ML architecture, fine-tuning techniques, or frameworks. But more likely, it’s from the breadth, scale and quality of the instruction data.

To put it bluntly, fine-tuning large language models on mediocre instruction data is a waste of compute. Let’s take a look at what has changed in the training data and learning paradigm—how we are now formatting the training data differently and therefore learning differently than in past large-scale pre-training.

RLHF stands for Reinforcement Learning from Human Feedback. It has two main components:

  1. Reinforcement Learning (RL)
  2. Human Feedback (HF)

Source link



Source link