Skip to content

Conversation

ikawrakow
Copy link
Owner

@ikawrakow ikawrakow commented Mar 9, 2025

This PR adds special purpose matrix-vector multiplications for MoE models.

For DeepSeek-Lite this results in a ~25% speedup for token generation.

For now only implemented with the -fmoe option and only for quantized experts.

@ikawrakow ikawrakow force-pushed the ik/cuda_faster_moe_tg branch from cb1636b to 90ab066 Compare March 9, 2025 14:56
@ikawrakow ikawrakow merged commit 699c9cb into main Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant