More added week 1

This commit is contained in:
John 2024-03-02 22:46:50 +01:00
parent 8795fa4c1d
commit c26604e416
3 changed files with 51 additions and 5 deletions

View File

@ -1,13 +1,19 @@
# Generative AI & LLMs
Transformer model revolutionized the field of natural language processing (NLP) and became the basis for the LLMs we now know - such as GPT, PaLM and others.
Transformer models replaces traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs) with an entirely attention-based mechanism
The Transformer model uses self-attention to compute representations of input sequences, which allows it to capture long-term dependencies and parallelize computation effectively.
The Transformer architecture consists of an encoder and a decoder, each of which is composed of several layers. Each layer consists of two sub-layers: a multi-head self-attention mechanism and a feed-forward neural network. The multi-head self-attention mechanism allows the model to attend to different parts of the input sequence, while the feed-forward network applies a point-wise fully connected layer to each position separately and identically.
The Transformer model also uses residual connections and layer normalization to facilitate training and prevent overfitting. In addition, the authors introduce a positional encoding scheme that encodes the position of each token in the input sequence, enabling the model to capture the order of the sequence without the need for recurrent or convolutional operations.
[Paper: Transformers: Attention is all you need.](https://arxiv.org/pdf/1706.03762.pdf)
## Introduction
Large language models, their use cases, how the models work, prompt engineering, how to make creative text outputs, and outline a project lifecycle for generative AI projects.
Generative AI is a subset of traditional machine learning.
And the machine learning models that underpin generative AI have
learned these abilities by finding statistical patterns in massive
datasets of content that was originally generated by humans.
Generative AI is a subset of traditional machine learning. And the machine learning models that underpin generative AI have learned these abilities by finding statistical patterns in massive datasets of content that was originally generated by humans.
Foundation models, sometimes called base models. Examples are GTP, BERT, LLaMa, BLOOM, FLAN-T5 and PaLM
@ -37,7 +43,6 @@ differs from model to model. The output of the model is called a **completion**,
This novel approach unlocked the progress in generative AI that we see today. It can be **scaled efficiently** to use multi-core GPUs, it can **parallel process input data**, making use of much larger training datasets, and crucially, it's able to learn **to pay attention to the meaning of the words it's processing**.
[Paper: Transformers: Attention is all you need.](https://arxiv.org/pdf/1706.03762.pdf)
The power of the transformer architecture lies in its ability to learn the relevance and context of all of the words in a sentence. To apply attention weights to those relationships so that the model learns the relevance of each word to each other words no matter where they are in the input.
@ -98,3 +103,44 @@ One single token will have a score higher than the rest, but there are a number
[Video Transformer Architecture](Transformers architecture.mp4)Transformers architecture.mp4
[Video Transformer Architecture](images/TransformersArchitecture.mp4)
## Example Prediction Process
![example](images/exampleTransformerTranslation.png)
Encoder: Encodes inputs woth contextual understanding and produces one vector per input token
Decoder: Accepts input tokens and generates new tokens.
1. tokenize the input words using this same tokenizer that was used to train the network.2. These tokens are then added into the input on the encoder side of the network.
3. passed through the embedding layer.
4. fed into the multi-headed attention layers.
5. The outputs of the multi-headed attention layers are fed through a feed-forward network to the output of the encoder.
At this point, the data that leaves the encoder is a deep representation of the structure and meaning of the input sequence.
This representation is inserted into the middle of the decoder to influence
the decoder's self-attention mechanisms.
1. a start of sequence token is added to the input of the decoder.
2. This triggers the decoder to predict the next token, which it does based on the contextual understanding that it'sbeing provided from the encoder.
3. The output of the decoder's self-attention layers gets passed through the decoder feed-forward network and through a final softmax output layer.
At this point, we have our first token. You'll continue this loop, passing the output token back to the input to trigger the generation of the next token, until the model predicts an end-of-sequence token.
At this point, the final sequence of tokens can be detokenized into words, and you have your output.
There are multiple ways in which you can use the output from the softmax layer to predict the next token.
## Split decoder and encoder architecture
**Encoder-only models** also work as sequence-to-sequence models, but without further modification, the input sequence and the output sequence or the same length.
Their use is less common these days, but by adding additional layers to the architecture, you can train encoder-only models to perform classification tasks such as sentiment analysis, **BERT** is an example of an encoder-only model.
**Encoder-decoder models**, as you've seen, perform well on sequence-to-sequence tasks such as translation, where the input sequence and the output sequence can be different lengths. Examples are BART and T5.
**Decoder-only models** are some of the most commonly used today. Example are the GPT family of models, BLOOM, Jurassic, LLaMA, and many more.
[Video Generating text with transformers](images/VideoGeneratingTextWithTransformers.mp4)

Binary file not shown.

After

Width:  |  Height:  |  Size: 258 KiB