Meta AI Launches Faster, Smaller New Quantized Versions Llama 3.2 Models for Mobile AI Access
Meta AI’s new quantized Llama 3.2 models provide faster performance, and lower prices, and bring advanced AI to mobile and consumer electronics.
Meta AI has recently launched new quantized updates to its Llama 3.2 models. These are designed to help make better AI easier to employ in a more powerful form across lots of devices including such mobile platforms. The quantized models come in 1B and 3B flavors and, according to the company, reduce model sizes by 56% and inference time by 2–4 times. This will make AI work on bog standard hardware which should reduce the costs of implementing this technology.
Innovative Quantization Techniques Enhance Performance
Meta AI used two different quantization methods to make these models: Quantization-Aware Training (QAT) with LoRA adapters for accuracy and SpinQuant, a method for post-training that focuses on flexibility. This two-pronged method lets Llama 3.2 keep its high level of accuracy while using less memory and computing power. This fits with Meta’s goal to make AI more accessible to everyone by lowering the hardware needs and costs for businesses and researchers alike.
These quantized versions of Llama 3.2 models facilitate high-level adopted NLP tasks on consumer-level GPUs, CPUs, and even mobile SoCs by Qualcomm and MediaTek, etc. In partnership with these leading industry players, Meta AI has fine-tuned the models for deployment on various form factors, opening up the opportunity for broad adoption of AI solutions for real-time use and for edge cases where computing resources are scarce.
In a technical sense, quantization means changing the precision of model weights and activations to 8-bit and 4-bit forms. This changes the way memory is used, which is 41% less than the original BF16 format. This method lets the quantized Llama 3.2 models keep about 95% of the original Llama 3 model’s performance on NLP measures, even though they use less memory.
It also shows that Meta AI is focused on delivering AI solutions that are reliable, safe, and quick to scale up while, at the same time, maintaining its high levels of safety and performance. As an immense improvement on the problems of accessibility offered by previous large language models, Meta AI from Llama 3.2 quantized Llama version removes barriers to industries’ solutions towards AI equity as well as sustainability.