RLHF update

This commit is contained in:
John 2024-03-27 15:56:14 +01:00
parent a704312272
commit 2393b6a5ea
8 changed files with 47 additions and 0 deletions

View File

@ -0,0 +1,47 @@
# Reinforcement learning and LLM-powered applications
**RLHF** helps to align the model with human values.
For example, LLMs might have a challenge in that it's creating sometimes **harmful content** or like a **toxic tone or voice**. By aligning the model with **human feedback and using reinforcement learning** as an algorithm. You can help to align the model to **reduce that** and to align towards, less harmful content and much more helpful content as well.
## Reinforcement learning from human feedback (RLHF)
RLHF helps to align the model with human values.
These important human values, **helpfulness, honesty, and harmlessness** are sometimes collectively called **HHH**, and are a set of principles that guide developers in the responsible use of AI
![RLHF advantages](images/RLHF1.png)
One potentially exciting application of RLHF is the **personalizations of LLMs**, where models learn the preferences of each individual user through a continuous feedback process. This could lead to exciting new technologies like individualized learning plans or personalized AI assistants.
### how RLHF works?
**Reinforcement learning** is a type of machine learning in which an **agent** learns to make decisions related to a **specific goal** by taking actions in an environment, with the objective of **maximizing** some notion of a **cumulative reward**.
![RLHF advantages](images/RLHF2.png)
The text is, for example, helpful, accurate, and non-toxic.
The environment is the context window of the model
The space in which text can be entered via a prompt
At any given moment, the action that the model will take, meaning which token it will choose next, depends on the prompt text in the context and the probability distribution over the vocabulary space. The reward is assigned based on how closely the completions align with human preferences.
**Reward model**, to classify the outputs of the LLM and evaluate the degree of alignment with human preferences. It plays a central role in how the model updates its weights over many iterations.
The sequence of actions and states is called a **rollout**.
### RLHF: Obtaining feedback from humans
![RLHF advantages](images/RLHF3.png)
The clarity of your instructions can make a big difference on the quality of the human feedback you obtain.
[Learning to summarize from human feedback](images/Learningtosummarizefromhumanfeedback.pdf)
[Fine-Tune LLMs with RLHF](https://huggingface.co/blog/trl-peft)
**Constitutional AI** is a method for training models using a set of rules and principles that govern the model's behavior.
[Constitutional AI: Harmlessness from AI Feedback paper](images/ConstitutionalAI.pdf)

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 193 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 197 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB