Week 3 last

This commit is contained in:
John 2024-04-16 16:37:19 +02:00
parent bfbe4fdc01
commit d72d4f5101
7 changed files with 128 additions and 21 deletions

View File

@ -3,9 +3,6 @@
**RLHF** helps to align the model with human values.
For example, LLMs might have a challenge in that it's creating sometimes **harmful content** or like a **toxic tone or voice**. By aligning the model with **human feedback and using reinforcement learning** as an algorithm. You can help to align the model to **reduce that** and to align towards, less harmful content and much more helpful content as well.
## Reinforcement learning from human feedback (RLHF)
RLHF helps to align the model with human values.
@ -17,13 +14,13 @@ One potentially exciting application of RLHF is the **personalizations of LLMs**
### how RLHF works?
**Reinforcement learning** is a type of machine learning in which an **agent** learns to make decisions related to a **specific goal** by taking actions in an environment, with the objective of **maximizing** some notion of a **cumulative reward**.
**Reinforcement learning** is a type of machine learning in which an **agent** learns to make decisions related to a **specific goal** by taking actions in an environment, with the objective of **maximizing** some notion of a **cumulative reward**.
![RLHF advantages](images/RLHF2.png)
The text is, for example, helpful, accurate, and non-toxic.
The environment is the context window of the model
The space in which text can be entered via a prompt
The space in which text can be entered via a prompt.
At any given moment, the action that the model will take, meaning which token it will choose next, depends on the prompt text in the context and the probability distribution over the vocabulary space. The reward is assigned based on how closely the completions align with human preferences.
**Reward model**, to classify the outputs of the LLM and evaluate the degree of alignment with human preferences. It plays a central role in how the model updates its weights over many iterations.
The sequence of actions and states is called a **rollout**.
@ -36,10 +33,8 @@ The clarity of your instructions can make a big difference on the quality of the
[Learning to summarize from human feedback](images/Learningtosummarizefromhumanfeedback.pdf)
[Fine-Tune LLMs with RLHF](https://huggingface.co/blog/trl-peft)
**Constitutional AI** is a method for training models using a set of rules and principles that govern the model's behavior.
[Constitutional AI: Harmlessness from AI Feedback paper](images/ConstitutionalAI.pdf)
@ -48,7 +43,7 @@ The clarity of your instructions can make a big difference on the quality of the
### Introduction Model optimizations for deployment
Increase performance -> reduce LLM size, which reduces inference latency
Increase performance -> reduce LLM size, which reduces inference latency.
The challenge is to reduce the size of the model while still maintaining
model performance.
@ -58,9 +53,11 @@ model performance.
![Generative AI Project Lifecycle Cheat Sheet](images/GenerativeAIProjectLifecycleCheatSheet.png)
~[LLM-Powered Appplication](images/PowerApplications1.png)
### Interaction with external data
Langchain is an example of Orchestration Library
![LLM-Powered Appplication](images/PowerApplications1.png)
**Langchain** is an example of Orchestration Library
Retrieval Augmented Generation (**RAG**) is a great way to overcome the knowledge cutoff (because the world has changes since the model was trained with data current to that date) issue and help the model update its understanding of the world.
@ -72,3 +69,100 @@ The external data store could be a vector store,a SQL database, CSV files, Wikis
![RAG](images/RAG2.png)
### Interaction with external applications
- Connecting LLMs to external applications allows the model to interact with the broader world, extending their utility beyond language tasks. ie query databases.
- LLMs can be used to trigger actions when given the ability to interact with APIs.
- LLMs can also connect to other programming resources ie a Python interpreter to make calculations
- Workflow:
1. the LLM model needs to be able to generate a set of instructions so that the application knows what actions to take.
2. the completion needs to be formatted in a way that the broader application can understand. ie genereate a Python script or sql command.
3. the model may need to collect information that allows it to validate an action. Any information that is required for validation needs to be obtained from the user and contained in the completion so it can be passed through to the application.
- Structuring the prompts in the correct way is important for all of these tasks and can make a huge difference in the quality of a plan generated or the adherence to a desired output format specification
### Helping LLMs reason and plan with chain-of-thought
Complex reasoning can be challenging for LLMs, especially for problems that involve:
- multiple steps
- mathematics
Solution for multiple steps:
- prompting the model to think more like a human, by breaking the problem down into steps => this behavior is known as **chain-of-thought prompting**.
[Chan-of-ThoughtPrompting](images/Chain-of-ThoughtPrompting.pdf)
Solution mathematics:
- The model will not make any **mathematic calculations**, it's just reasoning => solutions: allow the model to interact with applications that do the calculations => **Program-aided language models (PAL)**
![PAL Architecture](images/PAL.png)
### Framework ReAct: Combining reasoning and action
**ReAct** is a prompting strategy that combines chain of thought reasoning with action planning.
ReAct uses structured examples to show a large language model how to reason through a problem and decide on actions to take that move it closer to a solution. It's important to note that in the ReAct framework, the LLM can only choose from a limited number of actions that are defined by a set of instructions that is pre-pended to the example prompt text
One solution that is being widely adopted is called **LangChain**, the LangChain framework provides you with modular pieces that contain the components necessary to work with LLMs.
Langchain has a set of predefined chains that have been optimized for different use cases.
LangChain defines another construct, known as an agent, that you can use to interpret the input from the user and determine which tool or tools to use to complete the task. LangChain currently includes agents for both PAL and ReAct, among others.
![Langchain](images/Langchain.png)
**Larger models** are generally your best choice for techniques that use advanced prompting, like **PAL or ReAct**.
**Smaller models** may struggle to understand the tasks in highly structured prompts and may require you to perform additional fine tuning to improve their ability to reason and plan.
[ReAct paper](images/ReAct-Paper.pdf)
[Github LangChain](https://github.com/langchain-ai/langchain)
### LLM application architectures
![Building Generative Apps](images/BuildingAppsLLM.png)
- Make use of your on-premises infrastructure for this or have it provided for you via on-demand and pay-as-you-go Cloud services
- include the large language models you want to use in your application. These could include foundation models, as well as the models you have adapted to your specific task.
- retrieve information from external sources, such as those discussed in the retrieval augmented generation section.
- Application will return the completions from your large language model to the user or consuming application. Depending on your use case, you may need to implement a mechanism to capture and store the outputs. Gather feedback from users that may be useful for additional fine-tuning, alignment, or evaluation as your application matures.
- may need to use additional tools and frameworks for large language models that help you easily implement some of the techniques.
- The final layer, typically have some type of user interface that the application will be consumed through, such as a website or a rest API. This layer is where you'll also include the security components required for interacting with your application.
- At a high level, this architecture stack represents the various components to consider as part of your generative AI applications.
### Responsible AI
Special challenges of responsible generative Al
- **Toxicity** LLM returns responses that can be potentially harmful or discriminatory towards protected groups or protected attributes
How to mitigate?
- Careful curation of training data
- Train guardrail models to filter out unwanted content
- Diverse group of human annotators
- **Hallucinations** LLM generates factually incorrect content
How to mitigate?
- Educate users about how generative Al works
- Add disclaimers
- Augment LLMs with independent, verified citation databases
- Always trace back to where we got the information came from (watermarks, fingerprints)
- Define intended/unintended use cases
- **Intellectual Property** Ensure people aren't plagiarizing, make sure there aren't any copyright issues
How to mitigate?
- Mix of technology, policy, and legal mechanisms
- Machine "unlearning"
- Filtering and blocking approaches
There's a new concept of machine unlearning in which protected content or its effects on generative AI outputs are reduced or removed.
#### Responsibly build and use generative Al models
- Define use cases: the more specific/narrow, the better
- Assess risks for each use case
- Evaluate performance for each use case
- Iterate over entire Al lifecycle
- governance policies

Binary file not shown.

After

Width:  |  Height:  |  Size: 225 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 147 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

View File

@ -11,23 +11,36 @@ created: 2021-05-04 14:58:11Z
```bash
for keyfile in ~/.ssh/id_*; do ssh-keygen -l -f "${keyfile}"; done | uniq
```
Ed25519 is intended to provide attack resistance comparable to quality 128-bit symmetric ciphers.
```bash
ssh-keygen -o -a 100 -t ed25519
```
result
```bash
~/.ssh/id_ed25519
```
### Change or set a passphrase
```bash
ssh-keygen -f ~/.ssh/id_rsa -p -o -a 100
```
[Source](https://blog.g3rt.nl/upgrade-your-ssh-keys.html)
## Server
ssh-keygen -f ~/.ssh/id_rsa -p -o -a 100 # don't use
ssh-keygen -t rsa -b 4096 -C "<your_email@domain.com>"
### on client
``` bash
ssh-copy-id remote_username@server_ip_address
# or if ssh-copy is not available
cat ~/.ssh/id_rsa.pub | ssh remote_username@server_ip_address "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
```
[source](https://linuxize.com/post/how-to-setup-passwordless-ssh-login/)
[config ssh](https://ubuntuhandbook.org/index.php/2024/04/install-ssh-ubuntu-2404/)