LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering
LAMB is a training-free method that significantly improves the long-context understanding of Mamba models through attention-guided token filtering.
Our analysis reveals that performance degradation in state space models (SSMs) on long context primarily stems from exponential decay in hidden state memory, which can be effectively mitigated by preserving a small subset of critical tokens identified via their attention patterns. Motivated by this insight, LAMB introduces an attention-based metric for token selection, substantially enhancing the retention of critical context. Extensive evaluations demonstrate that LAMB achieves up to 30.35% improvement over previous state-of-the-art techniques across various long-context benchmarks.
bash ./build_env.sh
This script will create a dedicated Python environment and install all required packages.
This section details how to evaluate LAMB.
The run_eval.py
script uses the following main arguments:
Flag | Description | Default |
---|---|---|
-d , --device |
CUDA device ID. | '0' |
--model |
Path to the pre-trained model or model name from Hugging Face. | "state-spaces/mamba2-1.3b" |
--config |
Path to the remapping configuration JSON file for LAMB. (None for vanilla model) | None |
-lt , --long_eval_task |
Specify the evaluation task for LongBench task. | 'no' |
--helmet_config |
Specify the evaluation task for Helmet task. (None for no Helmet task) | None |
--sample_path |
Path to the .txt file for perplexity calculation. Used in perplexity task --ppl . |
"subseq_lambada.txt" |
--ppl |
Enable perplexity task on special input | False |
To run LongBench tasks, set the --long_eval_task
argument to yes
(LongBench), e
(LongBench-E), or c
(a subset).
```bash
# vanilla
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
--model state-spaces/mamba2-1.3b \
--long_eval_task c \
--device 0
# ours
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
--model state-spaces/mamba2-1.3b \
--config ./remapping_configs/mamba2-1.3b_topk1024_pk9.json \
--long_eval_task c \
--device 0
```
To run Helmet tasks, set the --helmet_config
argument to config path (under folder "./helmet/configs/"
) and set the --helmet_output_dir
to specify the output path.
```bash
# vanilla
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
--model state-spaces/mamba2-1.3b \
--helmet_config ./helmet/configs/longqa_short.yaml \
--helmet_output_dir ./helmet/output \
--device 0
# ours
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
--model state-spaces/mamba2-1.3b \
--config ./remapping_configs/mamba2-1.3b_topk1024_pk9.json \
--helmet_config ./helmet/configs/longqa_short.yaml \
--helmet_output_dir ./helmet/output \
--device 0
```
To calculate perplexity on a custom text file, add --ppl
and specify the input file using --sample_path
.
```bash
# vanilla
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
--model state-spaces/mamba2-1.3b \
--sample_path ./subseq_lambada.txt \
--device 0 \
--ppl
# ours
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
--model state-spaces/mamba2-1.3b \
--config ./remapping_configs/mamba2-1.3b_topk1024_pk9.json \
--sample_path ./subseq_lambada.txt \
--device 0 \
--ppl
```
This “research quality code” is for Non-Commercial purposes and provided by the contributors “As Is” without any express or implied warranty of any kind. The organizations (Georgia Tech or Intel) involved do not own the rights to the data sets used and do not confer any rights to it. The organizations (Georgia Tech or Intel) do not warrant or assume responsibility for the accuracy or completeness of any information, text, graphics, links or other items within the code. A thorough security review has not been performed on this code. Additionally, this repository may contain components that are out of date or contain known security vulnerabilities.
If you find our work valuable, please consider citing our paper:
@inproceedings{ye2025longmamba,
title={LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering},
author={Zhifan Ye and Zheng Wang and Kejing Xia and Jihoon Hong and Leshu Li and Lexington Whalen and Cheng Wan and Yonggan Fu and Yingyan Celine Lin and Souvik Kundu},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},
year={2025}
}