llama-quant : fix the verification of attention layers for encoder-decoder models #16023

DamonFool · 2025-09-16T10:54:37Z

llama.cpp fails to quantize T5 models with unequal encoder-decoder blocks.
The failure was caused by the failing of the verification of attention layers.

GGML_ASSERT((qs.n_attention_wv == n_attn_layer - pruned_attention_w) && "n_attention_wv is unexpected");

The original verification has assumed that the encoder and decoder have the same number of blocks.
So it fails with unequal encoder-decoder models.

Testing: flan-t5-small, t5-small and an unequal encoder-decoder t5 model

…coder models Signed-off-by: Jie Fu <[email protected]>

src/llama-quant.cpp

DamonFool · 2025-09-17T01:35:02Z

Hi @CISC , there is another encoder-decoder pr here #16002 .

The simple example is a very good startup to help people get the llama.cpp integrated into their apps.
It would be helpful to also support encoder-decoder models in that example.
Hope you are fine with it
Thanks.

DamonFool · 2025-09-17T07:59:38Z

Thanks @CISC .

…coder models (ggml-org#16023) Signed-off-by: Jie Fu <[email protected]>

llama-quant : fix the verification of attention layers for encoder-de…

22ccb6a

…coder models Signed-off-by: Jie Fu <[email protected]>

CISC reviewed Sep 16, 2025

View reviewed changes

src/llama-quant.cpp Show resolved Hide resolved

CISC approved these changes Sep 17, 2025

View reviewed changes

CISC merged commit 745cbcf into ggml-org:master Sep 17, 2025
47 of 48 checks passed

DamonFool deleted the llama-quant-t5 branch September 17, 2025 07:59

angt pushed a commit to angt/llama.cpp that referenced this pull request Sep 17, 2025

llama-quant : fix the verification of attention layers for encoder-de…

b156f68

…coder models (ggml-org#16023) Signed-off-by: Jie Fu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-quant : fix the verification of attention layers for encoder-decoder models #16023

llama-quant : fix the verification of attention layers for encoder-decoder models #16023

Uh oh!

DamonFool commented Sep 16, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Uh oh!

llama-quant : fix the verification of attention layers for encoder-decoder models #16023

llama-quant : fix the verification of attention layers for encoder-decoder models #16023

Uh oh!

Conversation

DamonFool commented Sep 16, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Uh oh!