When Are 1.58 Bits Enough? A Bottom-up Exploration of Quantization-Aware Training with Ternary Weights

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

5 Downloads (Pure)

Abstract

Contemporary machine learning models, such as language models, are powerful, but come with immense resource requirements both at training and inference time. Quantization aware pre-training with ternary weights (1.58 bits per weight) has shown promising results in decoder-only language models and facilitates memoryefficient inference. However, little is known about how quantization-aware training influences the training dynamics beyond such Transformer-based decoder-only language models. Here, we engage in a bottom-up exploration of quantization-aware training, starting with multi-layer perceptrons and graph neural networks. Then, we explore 1.58-bit training in other transformer-based language models: encoder-only and encoderdecoder models. Our results show that in all of these settings, 1.58-bit training is on par with standard 32/16-bit models, yet we also identify challenges specific to 1.58-bit encoder-decoder models. Our results on decoderonly language models hint at a possible regularization effect introduced by quantization-aware training.

Original languageEnglish
Title of host publicationProceedings of the 17th International Conference on Agents and Artificial Intelligence
Volume3
Publication date2025
Pages1440-1449
ISBN (Print)978-989-758-737-5
DOIs
Publication statusPublished - 2025
Event17th International Conference on Agents and Artificial Intelligence, ICAART 2025 - Porto, Portugal
Duration: 23. Feb 202525. Feb 2025

Conference

Conference17th International Conference on Agents and Artificial Intelligence, ICAART 2025
Country/TerritoryPortugal
CityPorto
Period23/02/202525/02/2025
SeriesInternational Conference on Agents and Artificial Intelligence
ISSN2184-3589

Keywords

  • Graph Neural Networks
  • Language Models
  • Quantization
  • Quantization-Aware Training
  • Text Classification

Fingerprint

Dive into the research topics of 'When Are 1.58 Bits Enough? A Bottom-up Exploration of Quantization-Aware Training with Ternary Weights'. Together they form a unique fingerprint.

Cite this