BitNet B1.58 Reloaded: State-of-the-Art Performance Also on Smaller Networks

Jacob Nielsen, Peter Schneider-Kamp*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

6 Downloads (Pure)

Abstract

Recently proposed methods for 1-bit and 1.58-bit quantization aware training investigate the performance and behavior of these methods in the context of large language models, finding state-of-the-art performance for models with more than 3B parameters. In this work, we investigate 1.58-bit quantization for small language and vision models ranging from 100K to 48M parameters. We introduce a variant of BitNet b1.58, which allows to rely on the median rather than the mean in the quantization process. Through extensive experiments we investigate the performance of 1.58-bit models obtained through quantization aware training. We further investigate the robustness of 1.58-bit quantization-aware training to changes in the learning rate and regularization through weight decay, finding different patterns for small language and vision models than previously reported for large language models. Our results showcase that 1.58-bit quantization-aware training provides state-of-the-art performance for small language models when doubling hidden layer sizes and reaches or even surpasses state-of-the-art performance for small vision models of identical size. Ultimately, we demonstrate that 1.58-bit quantization-aware training is a viable and promising approach also for training smaller deep learning networks, facilitating deployment of such models in low-resource use-cases and encouraging future research.

Original languageEnglish
Title of host publicationDeep Learning Theory and Applications - 5th International Conference, DeLTA 2024, Proceedings
EditorsAna Fred, Allel Hadjali, Oleg Gusikhin, Carlo Sansone
PublisherSpringer Science+Business Media
Publication date2024
Pages301-315
ISBN (Print)9783031667046
DOIs
Publication statusPublished - 2024
Event5th International Conference on Deep Learning Theory and Applications, DeLTA 2024 - Dijon, France
Duration: 10. Jul 202411. Jul 2024

Conference

Conference5th International Conference on Deep Learning Theory and Applications, DeLTA 2024
Country/TerritoryFrance
CityDijon
Period10/07/202411/07/2024
SeriesCommunications in Computer and Information Science
Volume2172 CCIS
ISSN1865-0929

Keywords

  • Deep learning
  • Green machine learning
  • Image classification
  • Quantization-Aware training
  • Small language models

Fingerprint

Dive into the research topics of 'BitNet B1.58 Reloaded: State-of-the-Art Performance Also on Smaller Networks'. Together they form a unique fingerprint.

Cite this