Faster IQ3_KT and IQ4_KT #453

ikawrakow · 2025-05-24T07:01:55Z

The PR improves AVX2 performance for the trellis quants IQ3_KT and IQ4_KT recently added in PR #441.
The results below are for LLaMA-3.1-8B on a Ryzen-5975WX CPU.

IQ3_KT

N_KV	S_PP t/s (main)	S_PP t/s (PR)	PP speedup	S_TG t/s (main)	S_TG t/s (PR)	TG speedup
0	61.98	71.59	1.155	11.17	13.30	1.191
512	61.27	70.79	1.155	11.10	13.19	1.188
1024	60.48	69.93	1.156	11.04	13.10	1.187
1536	59.94	69.15	1.154	10.95	12.96	1.184
2048	59.48	68.55	1.152	10.87	12.85	1.182

IQ4_KT

N_KV	S_PP t/s (main)	S_PP t/s (PR)	PP speedup	S_TG t/s (main)	S_TG t/s (PR)	TG speedup
0	44.32	64.91	1.465	9.36	11.69	1.249
512	43.90	64.12	1.461	9.26	11.56	1.248
1024	43.60	63.39	1.454	9.19	11.47	1.248
1536	43.32	62.86	1.451	9.12	11.37	1.247
2048	43.07	62.37	1.448	9.06	11.28	1.245

CPU performance is still much lower than other quantization types. But memory bandwidth is far from saturated, so PP and TG will be better on a faster CPU with more cores.

PP is now almost 50% better than original, TG is ~20% better

Iwan Kawrakow added 6 commits May 23, 2025 13:19

Somewhat faster iq3_kt (AVX2)

1500060

Cleanup

31988c7

Slightly faster iq4_kt

48c0b7d

Slightly faster iq4_kt

fb254f0

PP is now almost 50% better than original, TG is ~20% better

Cleanup

5929faf

Very slightly faster iq4_kt TG

3fe6c0a

ikawrakow merged commit a2c42f9 into main May 24, 2025

ikawrakow mentioned this pull request Jun 1, 2025

Trellis quantization #113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster IQ3_KT and IQ4_KT #453

Faster IQ3_KT and IQ4_KT #453

ikawrakow commented May 24, 2025

Uh oh!

Uh oh!

Faster IQ3_KT and IQ4_KT #453

Faster IQ3_KT and IQ4_KT #453

Conversation

ikawrakow commented May 24, 2025

IQ3_KT

IQ4_KT

Uh oh!

Uh oh!