Skip to content

GATECH-EIC/LAMB

Repository files navigation

LAMB

LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering

Paper | Poster | Slides

Intro

LAMB is a training-free method that significantly improves the long-context understanding of Mamba models through attention-guided token filtering.

Our analysis reveals that performance degradation in state space models (SSMs) on long context primarily stems from exponential decay in hidden state memory, which can be effectively mitigated by preserving a small subset of critical tokens identified via their attention patterns. Motivated by this insight, LAMB introduces an attention-based metric for token selection, substantially enhancing the retention of critical context. Extensive evaluations demonstrate that LAMB achieves up to 30.35% improvement over previous state-of-the-art techniques across various long-context benchmarks.

Performance

Getting Started

1. Install Environment

bash ./build_env.sh

This script will create a dedicated Python environment and install all required packages.

Evaluation

This section details how to evaluate LAMB.

2. Configuration Parameters

The run_eval.py script uses the following main arguments:

Flag Description Default
-d, --device CUDA device ID. '0'
--model Path to the pre-trained model or model name from Hugging Face. "state-spaces/mamba2-1.3b"
--config Path to the remapping configuration JSON file for LAMB. (None for vanilla model) None
-lt, --long_eval_task Specify the evaluation task for LongBench task. 'no'
--helmet_config Specify the evaluation task for Helmet task. (None for no Helmet task) None
--sample_path Path to the .txt file for perplexity calculation. Used in perplexity task --ppl. "subseq_lambada.txt"
--ppl Enable perplexity task on special input False

3. Running Evaluations

3.1 LongBench Evaluation

To run LongBench tasks, set the --long_eval_task argument to yes (LongBench), e (LongBench-E), or c (a subset).

```bash
# vanilla
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
  --model state-spaces/mamba2-1.3b \
  --long_eval_task c \
  --device 0

# ours
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
  --model state-spaces/mamba2-1.3b \
  --config ./remapping_configs/mamba2-1.3b_topk1024_pk9.json \
  --long_eval_task c \
  --device 0
```

3.2 Helmet Evaluation

To run Helmet tasks, set the --helmet_config argument to config path (under folder "./helmet/configs/") and set the --helmet_output_dir to specify the output path.

```bash
# vanilla
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
  --model state-spaces/mamba2-1.3b \
  --helmet_config ./helmet/configs/longqa_short.yaml \
  --helmet_output_dir ./helmet/output \
  --device 0

# ours
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
  --model state-spaces/mamba2-1.3b \
  --config ./remapping_configs/mamba2-1.3b_topk1024_pk9.json \
  --helmet_config ./helmet/configs/longqa_short.yaml \
  --helmet_output_dir ./helmet/output \
  --device 0
```

3.3 Custom Perplexity Calculation

To calculate perplexity on a custom text file, add --ppl and specify the input file using --sample_path.

```bash
# vanilla
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
  --model state-spaces/mamba2-1.3b \
  --sample_path ./subseq_lambada.txt \
  --device 0 \
  --ppl

# ours
CUDA_VISIBLE_DEVICES=0 python run_eval.py \
  --model state-spaces/mamba2-1.3b \
  --config ./remapping_configs/mamba2-1.3b_topk1024_pk9.json \
  --sample_path ./subseq_lambada.txt \
  --device 0 \
  --ppl
```

Disclaimer:

This “research quality code” is for Non-Commercial purposes and provided by the contributors “As Is” without any express or implied warranty of any kind. The organizations (Georgia Tech or Intel) involved do not own the rights to the data sets used and do not confer any rights to it. The organizations (Georgia Tech or Intel) do not warrant or assume responsibility for the accuracy or completeness of any information, text, graphics, links or other items within the code. A thorough security review has not been performed on this code. Additionally, this repository may contain components that are out of date or contain known security vulnerabilities.

Citation

If you find our work valuable, please consider citing our paper:

@inproceedings{ye2025longmamba,
  title={LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering},
  author={Zhifan Ye and Zheng Wang and Kejing Xia and Jihoon Hong and Leshu Li and Lexington Whalen and Cheng Wan and Yonggan Fu and Yingyan Celine Lin and Souvik Kundu},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published