Nvidia & Mistral AI Introduce Compact Minitron 8B: Efficiency Unleashed
Mistral AI and Nvidia worked together to make the Minitron 8B, a small but effective AI model that was made better by pruning and distilling information.
Mistral AI and Nvidia worked together to show off the Mistral-NeMo-Minitron 8B, a new artificial intelligence (AI) model. Even though it’s small, this model promises to be efficient and work well.
After Mistral released the NeMo 12B model, the Minitron 8B was made using width-pruning, a technique that can make the model smaller while still keeping its accuracy.
The Mistral NeMo 12B base model was “width-pruned” to make the Minitron 8B, and then it went through a light retraining process using knowledge distillation. Knowledge distillation is a method in which a simpler model, called the “student,” learns from a more complicated model, called the “teacher.” With this method, the smaller model can still make good predictions, but it can do so faster and with fewer resources.
New research from Nvidia (Compact Language Models via Pruning and Knowledge Distillation) shows that models that have been trimmed and distilled can do better than models that have been trained from scratch.
For the Minitron 8B, the Mistral NeMo 12B model was fine-tuned with 127 billion tokens, and then certain dimensions within the model were selectively pruned. As a result, we have a small, efficient model that is more accurate than its peers. Iterative pruning and distillation is Nvidia’s strategy for making a family of models. It claims to save a lot of money on computing costs.