fix exponential
This commit is contained in:
parent
1a5e9f0a4e
commit
d1d5d4d6ae
|
@ -250,12 +250,13 @@ Model capability with size has driven the development of larger and larger model
|
|||
## Computational challenges of training LLMs
|
||||
|
||||
Most common issues: OutOfMemoryError: CUDA out of memory.
|
||||
|
||||
CUDA = Compute Unified Device Architecture
|
||||
|
||||
Weigths:
|
||||
|
||||
- 1 parameter = 4 bytes (32 float)
|
||||
- 1B parameters = 4 x 10^9 bytes = 4GB
|
||||
- 1B parameters = 4 x $10^9$ bytes = 4GB
|
||||
|
||||
In general: GPU memory needed tot train 1B parameters is 6 times model size = 24GB
|
||||
|
||||
|
@ -265,13 +266,13 @@ In general: GPU memory needed tot train 1B parameters is 6 times model size = 24
|
|||
|
||||
The downside is that BF16 is not well suited for integer calculations, but these are relatively rare in deep learning.
|
||||
|
||||
![quantization Summary](images/quantizationSummary.png.png)
|
||||
![Quantization Summary](images/quantizationSummary.png)
|
||||
|
||||
So, for full precision model of 4GB @ 32-bit full precision -> 16 bit quantized 2GB @ 16bit half precison -> 8 bit quantized model 1GB @ 8bit precision
|
||||
|
||||
![GPU RAM needed for larger models](images/GPURAMbneeded.png)
|
||||
|
||||
[video Computational challenges of training LLMs ](images/ComputationalChallengesOfTrainingLLMs.mp4)
|
||||
[Video Computational challenges of training LLMs ](images/ComputationalChallengesOfTrainingLLMs.mp4)
|
||||
|
||||
[Efficient multi-GPU compute strategies](images/Efficientmulti-GPUcomputestrategies.mp4)
|
||||
[Video Efficient multi-GPU compute strategies](images/Efficientmulti-GPUcomputestrategies.mp4)
|
||||
|
||||
|
|
Loading…
Reference in New Issue