LoRA Soft Prompt

This commit is contained in:
John 2024-03-26 17:59:08 +01:00
parent dc9ab4e878
commit a704312272
12 changed files with 61 additions and 0 deletions

View File

@ -1,5 +1,7 @@
# Fine-tuning
With just a few hundred examples tune a model to specific task, which is truly amazing.
## Instruction fine-tuning
### fine tune an LLM with instruction prompts
@ -106,9 +108,68 @@ Evaluation Benchmarks
![HELM](images/HELM.png)
## Parameter efficient fine-tuning (PEFT)
Full fine-tuning where every model weight is updated during supervised learning, parameter efficient fine tuning methods **only update a small subset of parameters**.
- techniques freeze most of the model weights and focus on fine tuning a subset of existing model parameters
- don't touch the original model weights at all, and instead add a small number of new parameters or layers and fine-tune only the new components.
- With **PEFT**, most if not all of the LLM weights are kept frozen. As a result, the number of trained parameters is much smaller than the number of parameters in the original LLM. PEFT can often be performed on a **single GPU**.PEFT is **less prone to the catastrophic forgetting problems** of full fine-tuning.
![PEFT](images/PEFT1.png)
The PEFT weights are trained for each task and can be easily swapped out for inference, allowing efficient adaptation of the original model to multiple tasks.
![PEFT](images/PEFT2.png)
Several methods each with trade-offs on parameter efficiency, memory efficiency, training speed, model quality, and inference costs.
![PEFT](images/PEFT3.png)
- Adapter methods add new trainable layers to the architecture of the model,typically inside the encoder or decoder components after the attention or feed-forward layers.
- Soft prompt methods keep the model architecture fixed and frozen, and focus on manipulating the input to achieve better performance. This can be done by adding trainable parameters to the prompt embeddings or keeping the input fixed and retraining the embedding weights
## LoRA (LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS)
[LoRA paper](images/LoRA.pdf)
[QLoRA paper](images/QLora.pdf)
Re-parameterization
![LoRA](images/LoRA1.png)
![LoRA Example](images/LoRA2.png)
Use LoRA to train many tasks
![LoRA many tasks](images/LoRA3.png)
In principle, the smaller the rank r, the smaller the number of trainable parameters, and the bigger the savings on compute. However, there are some issues related to model performance to consider.
LoRA is broadly used in practice because of the comparable performance to full fine tuning for many tasks and data sets
**QLoRA**: combine LoRA with the quantization techniques; further reduce memory footprint.
## Soft prompts
[The Power of Scale for Parameter-Efficient Prompt Tuning paper](images/PromptTuning.pdf)
prompt tuning is NOT prompt engineering
![Soft Prompt](images/Softprompt1.png)
Only the soft prompt weights are updated.
![Soft Prompt](images/Softprompt2.png)
Prompt tuning can be as effective as full fine tuning and offers a significant boost in performance over prompt engineering alone, for models with large numbers of parameters.
The words closest to the soft prompt tokens have similar meanings. The words identified usually have some meaning related to the task, suggesting that the prompts are learning word like representations.
Python libs
evaluate
rouge_score
loralib
peft

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 259 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 273 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 197 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 221 KiB

Binary file not shown.

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 175 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 285 KiB