WWW.MARKTECHPOST.COM
LightOn and Answer.ai Releases ModernBERT: A New Model Series that is a Pareto Improvement over BERT with bothSpeedandAccuracy
Since the release of BERT in 2018, encoder-only transformer models have been widely used in natural language processing (NLP) applications due to their efficiency in retrieval and classification tasks. However, these models face notable limitations in contemporary applications. Their sequence length, capped at 512 tokens, hampers their ability to handle long-context tasks effectively. Furthermore, their architecture, vocabulary, and computational efficiency have not kept pace with advancements in hardware and training methodologies. These shortcomings become especially apparent in retrieval-augmented generation (RAG) pipelines, where encoder-based models provide context for large language models (LLMs). Despite their critical role, these models often rely on outdated designs, limiting their capacity to meet evolving demands.A team of researchers from LightOn, Answer.ai, Johns Hopkins University, NVIDIA, and Hugging Face have sought to address these challenges with the introduction of ModernBERT, an open family of encoder-only models. ModernBERT brings several architectural enhancements, extending the context length to 8,192 tokensa significant improvement over the original BERT. This increase enables it to perform well on long-context tasks. The integration of Flash Attention 2 and rotary positional embeddings (RoPE) enhances computational efficiency and positional understanding. Trained on 2 trillion tokens from diverse domains, including code, ModernBERT demonstrates improved performance across multiple tasks. It is available in two configurations: base (139M parameters) and large (395M parameters), offering options tailored to different needs while consistently outperforming models like RoBERTa and DeBERTa.Technical Details and BenefitsModernBERT incorporates several advancements in transformer design. Flash Attention enhances memory and computational efficiency, while alternating global-local attention mechanisms optimize long-context processing. RoPE embeddings improve positional understanding, ensuring effective performance across varied sequence lengths. The model also employs GeGLU activation functions and a deep, narrow architecture for a balanced trade-off between efficiency and capability. Stability during training is further ensured through pre-normalization blocks and the use of the StableAdamW optimizer with a trapezoidal learning rate schedule. These refinements make ModernBERT not only faster but also more resource-efficient, particularly for inference tasks on common GPUs.Results and InsightsModernBERT demonstrates strong performance across benchmarks. On the General Language Understanding Evaluation (GLUE) benchmark, it surpasses existing base models, including DeBERTaV3. In retrieval tasks like Dense Passage Retrieval (DPR) and ColBERT multi-vector retrieval, it achieves higher nDCG@10 scores compared to its peers. The models capabilities in long-context tasks are evident in the MLDR benchmark, where it outperforms older models and specialized long-context models such as GTE-en-MLM and NomicBERT. ModernBERT also excels in code-related tasks, including CodeSearchNet and StackOverflow-QA, benefiting from its code-aware tokenizer and diverse training data. Additionally, it processes significantly larger batch sizes than its predecessors, making it suitable for large-scale applications while maintaining memory efficiency.ConclusionModernBERT represents a thoughtful evolution of encoder-only transformer models, integrating modern architectural improvements with robust training methodologies. Its extended context length and enhanced efficiency address the limitations of earlier models, making it a versatile tool for a variety of NLP applications, including semantic search, classification, and code retrieval. By modernizing the foundational BERT architecture, ModernBERT meets the demands of contemporary NLP tasks. Released under the Apache 2.0 license and hosted on Hugging Face, it provides an accessible and efficient solution for researchers and practitioners seeking to advance the state of the art in NLP.Check out the Paper, Blog, and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our60k+ ML SubReddit. Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence.The post LightOn and Answer.ai Releases ModernBERT: A New Model Series that is a Pareto Improvement over BERT with bothSpeedandAccuracy appeared first on MarkTechPost.
0 Comments 0 Shares 28 Views