Papers
arxiv:2505.08651

Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing

Published on May 13
Authors:

Abstract

A language model with a 512K-token context length achieves superior in-context learning and robust retrieval capabilities, competing with other models on long-range reasoning benchmarks.

AI-generated summary

We present MegaBeam-Mistral-7B, a language model that supports 512K-token context length. Our work addresses practical limitations in long-context training, supporting real-world tasks such as compliance monitoring and verification. Evaluated on three long-context benchmarks, our 7B-parameter model demonstrates superior in-context learning performance on HELMET and robust retrieval and tracing capability on RULER. It is currently the only open model to achieve competitive long-range reasoning on BABILong at 512K context length without RAG or targeted fine-tuning. Released as fully open source under the Apache 2.0 license, the model has been downloaded over 100,000 times on Hugging Face. Model available at: https://huggingface.co/aws-prototyping/MegaBeam-Mistral-7B-512k

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.08651 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.
OSZAR »