next update llm week 1

This commit is contained in:
John 2024-03-04 20:50:00 +01:00
parent 418154dfd5
commit 4f3d7f2b8c
6 changed files with 31 additions and 3 deletions

View File

@ -173,6 +173,7 @@ Notice that in contrast to the blue bars, the probability is more evenly spread
and more variability in the output compared to a cool temperature setting. This can help you generate text that sounds more creative. If you leave the temperature value equal to one, this will leave the softmax function as default and the unaltered probability distribution will be used.
## Generative AI project lifecycle
autoregressive models
![Generative AI project lifecycle](images/Generative%20AI%20project%20lifecycle.png)
@ -216,7 +217,7 @@ Encoder-only models are also known as Autoencoding models, and they are pre-trai
![AutoencodingModels](images/AutoencodingModels.png)
a. **Autoencoding models** spilled **bi-directional representations** of the input sequence, meaning that the model has an understanding of the full context of a token and not just of the words that come before. Encoder-only models are ideally suited to task that benefit from this bi-directional contexts. You can use them to carry out **sentence classification tasks**, for example, **sentiment analysis** or *token-level tasks* like **named entity recognition** or **word classification**. Some well-known examples of an autoencoder model are **BERT** and **RoBERTa**.
a. **Autoencoding models** spilled **bi-directional representations** of the input sequence, meaning that the model has an understanding of the full context of a token and not just of the words that come before. Encoder-only models are ideally suited to task that benefit from this bi-directional contexts. You can use them to carry out **sentence classification tasks**, for example, **sentiment analysis** or _token-level tasks_ like **named entity recognition** or **word classification**. Some well-known examples of an autoencoder model are **BERT** and **RoBERTa**.
**Decoder-only or autoregressive models**
the training objective is to predict the next token based on the previous sequence of tokens. Predicting the next token is sometimes called full language modeling.
@ -231,11 +232,10 @@ from model to model.
![sequence-to-sequenceModel](images/sequence-to-sequenceModel.png)
A popular sequence-to-sequence model **T5**, pre-trains the encoder using span corruption, which masks random sequences of input tokens. Those mass sequences are then
replaced with a unique Sentinel token, shown here as x. Sentinel tokens are special tokens added to the vocabulary, but do not correspond to any actual word from the input text. The decoder is then tasked with reconstructing the mask token sequences uto-regressively. The output is the Sentinel token followed by the predicted tokens. You can use sequence-to-sequence models for **translation**, **summarization**, and **question-answering**. Well-known encoder-decoder model are **BART and T5**.
### Summary a comparison:
### Summary a comparison
~[Summary model archi and pretraning](images/SummaryModelArchiandPretraning.png)
@ -251,4 +251,32 @@ Model capability with size has driven the development of larger and larger model
## Computational challenges of training LLMs
Most common issues: OutOfMemoryError: CUDA out of memory.
CUDA = Compute Unified Device Architecture
Weigths:
- 1 parameter = 4 bytes (32 float)
- 1B parameters = 4 x 10^9 bytes = 4GB
In general: GPU memory needed tot train 1B parameters is 6 times model size = 24GB
**quantization** reducing the precision from 32-bit floating point numbers to 16-bit floating point numbers, or eight-bit integer numbers
![quantization](images/quantization1.png)
The downside is that BF16 is not well suited for integer calculations, but these are relatively rare in deep learning.
![quantization Summary](images/quantizationSummary.png.png)
So, for full precision model of 4GB @ 32-bit full precision -> 16 bit quantized 2GB @ 16bit half precison -> 8 bit quantized model 1GB @ 8bit precision
![GPU RAM needed for larger models](images/GPURAMbneeded.png)
![video Computational challenges of training LLMs ](images/ComputationalChallengesOfTrainingLLMs.mp4)
![Efficient multi-GPU compute strategies](images/Efficient%20multi-GPU%20compute%20strategies.mp4)

Binary file not shown.

After

Width:  |  Height:  |  Size: 276 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 278 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 265 KiB