TY - GEN
T1 - When Are 1.58 Bits Enough? A Bottom-up Exploration of Quantization-Aware Training with Ternary Weights
AU - Nielsen, Jacob
AU - Galke, Lukas
AU - Schneider-Kamp, Peter
N1 - Publisher Copyright:
© 2025 by SCITEPRESS – Science and Technology Publications, Lda.
PY - 2025
Y1 - 2025
N2 - Contemporary machine learning models, such as language models, are powerful, but come with immense resource requirements both at training and inference time. Quantization aware pre-training with ternary weights (1.58 bits per weight) has shown promising results in decoder-only language models and facilitates memoryefficient inference. However, little is known about how quantization-aware training influences the training dynamics beyond such Transformer-based decoder-only language models. Here, we engage in a bottom-up exploration of quantization-aware training, starting with multi-layer perceptrons and graph neural networks. Then, we explore 1.58-bit training in other transformer-based language models: encoder-only and encoderdecoder models. Our results show that in all of these settings, 1.58-bit training is on par with standard 32/16-bit models, yet we also identify challenges specific to 1.58-bit encoder-decoder models. Our results on decoderonly language models hint at a possible regularization effect introduced by quantization-aware training.
AB - Contemporary machine learning models, such as language models, are powerful, but come with immense resource requirements both at training and inference time. Quantization aware pre-training with ternary weights (1.58 bits per weight) has shown promising results in decoder-only language models and facilitates memoryefficient inference. However, little is known about how quantization-aware training influences the training dynamics beyond such Transformer-based decoder-only language models. Here, we engage in a bottom-up exploration of quantization-aware training, starting with multi-layer perceptrons and graph neural networks. Then, we explore 1.58-bit training in other transformer-based language models: encoder-only and encoderdecoder models. Our results show that in all of these settings, 1.58-bit training is on par with standard 32/16-bit models, yet we also identify challenges specific to 1.58-bit encoder-decoder models. Our results on decoderonly language models hint at a possible regularization effect introduced by quantization-aware training.
KW - Graph Neural Networks
KW - Language Models
KW - Quantization
KW - Quantization-Aware Training
KW - Text Classification
U2 - 10.5220/0013382400003890
DO - 10.5220/0013382400003890
M3 - Article in proceedings
AN - SCOPUS:105001956130
SN - 978-989-758-737-5
VL - 3
T3 - International Conference on Agents and Artificial Intelligence
SP - 1440
EP - 1449
BT - Proceedings of the 17th International Conference on Agents and Artificial Intelligence
T2 - 17th International Conference on Agents and Artificial Intelligence, ICAART 2025
Y2 - 23 February 2025 through 25 February 2025
ER -