Abstract
Contemporary machine learning models, such as language models, are powerful, but come with immense resource requirements both at training and inference time. Quantization aware pre-training with ternary weights (1.58 bits per weight) has shown promising results in decoder-only language models and facilitates memoryefficient inference. However, little is known about how quantization-aware training influences the training dynamics beyond such Transformer-based decoder-only language models. Here, we engage in a bottom-up exploration of quantization-aware training, starting with multi-layer perceptrons and graph neural networks. Then, we explore 1.58-bit training in other transformer-based language models: encoder-only and encoderdecoder models. Our results show that in all of these settings, 1.58-bit training is on par with standard 32/16-bit models, yet we also identify challenges specific to 1.58-bit encoder-decoder models. Our results on decoderonly language models hint at a possible regularization effect introduced by quantization-aware training.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 17th International Conference on Agents and Artificial Intelligence |
Vol/bind | 3 |
Publikationsdato | 2025 |
Sider | 1440-1449 |
ISBN (Trykt) | 978-989-758-737-5 |
DOI | |
Status | Udgivet - 2025 |
Begivenhed | 17th International Conference on Agents and Artificial Intelligence, ICAART 2025 - Porto, Portugal Varighed: 23. feb. 2025 → 25. feb. 2025 |
Konference
Konference | 17th International Conference on Agents and Artificial Intelligence, ICAART 2025 |
---|---|
Land/Område | Portugal |
By | Porto |
Periode | 23/02/2025 → 25/02/2025 |
Navn | International Conference on Agents and Artificial Intelligence |
---|---|
ISSN | 2184-3589 |
Bibliografisk note
Publisher Copyright:© 2025 by SCITEPRESS – Science and Technology Publications, Lda.