Post by account_disabled on Mar 14, 2024 9:16:26 GMT
MIG that is these are devices of different levels of functionality. We see that the characteristics of the A Ada look more promising than the A. But it is more expensive therefore a logical question arises is it worth overpaying To find the answer we took LLM in three sizes and gave our video cards a test drive. Test two GPUs solve the same problems under the same conditions We test video cards in additional training fine tuning of a large language model in different sizes . Ideally to train LLM for every billion of its parameters you need about.
GB of video memory not only the model but also other components are placed in RAM gradients optimizer Buy Email List states temporary buffers and so on. That is using only one video card as in our experiment we will not be able to complete the training of very many LLMs. There are different ways out for example combining two cards via NVLink or more cards via NVSwitch applying quantization or another method of reducing the model size. And since we are interested in a fair test of each card we will not combine them and increase computing resources but will go exactly along the path of quantization and we will encounter.
A lack of memory So the models are loaded in quantized form using the BitSandBytes library. To make training possible in principle we use the LoRA approach implemented in the peft library. The LoRA configuration remains unchanged rank of multipliers with the exception of starting training of the largest model on the A we lowered the rank of decomposed matrices to save memory and make it possible to at least load the model and batch into GPU memory. Preparing the working environment Lets prepare the working environment for testing. We need to install quite a few packages but this is done with literally two scripts.
GB of video memory not only the model but also other components are placed in RAM gradients optimizer Buy Email List states temporary buffers and so on. That is using only one video card as in our experiment we will not be able to complete the training of very many LLMs. There are different ways out for example combining two cards via NVLink or more cards via NVSwitch applying quantization or another method of reducing the model size. And since we are interested in a fair test of each card we will not combine them and increase computing resources but will go exactly along the path of quantization and we will encounter.
A lack of memory So the models are loaded in quantized form using the BitSandBytes library. To make training possible in principle we use the LoRA approach implemented in the peft library. The LoRA configuration remains unchanged rank of multipliers with the exception of starting training of the largest model on the A we lowered the rank of decomposed matrices to save memory and make it possible to at least load the model and batch into GPU memory. Preparing the working environment Lets prepare the working environment for testing. We need to install quite a few packages but this is done with literally two scripts.