Add mx_fp4 path #2201

drisspg · 2025-05-12T20:38:50Z

Stacked PRs:

->Add mx_fp4 path #2201

Add mx_fp4 path

BF16 70B MLP: https://fburl.com/aeqm5s4v~ 1300 us
MXFP8 70B MLP: https://fburl.com/uxgoju4r ~ 723us
MXFP4 70B MLP: https://fburl.com/u95f6f39 ~ 600us

Looks like we need to do some more kernel tuning since the gemm kernel is only about 70% the execution time instead of of 50%

Benchmarks

Baseline Model
Throughput: 56.68 requests/s, 24053.96 total tokens/s, 11590.20 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \
 --backend vllm \
 --model "Qwen/Qwen2-7B-Instruct" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --seed 42

MXFP8:
Throughput: 50.52 requests/s, 21443.10 total tokens/s, 10332.18 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \
 --backend vllm \
 --model "/home/drisspg/meta/scripts/data/mxfp8-Qwen2-7B-Instruct" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --seed 42

MXFP4:

Throughput: 56.64 requests/s, 24039.96 total tokens/s, 11583.46 output tokens/s
Total num prompt tokens:  225190
Total num output tokens:  209407

python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py
--backend vllm
--model "/home/drisspg/meta/scripts/data/mxfp4-Qwen2-7B-Instruct"
--dataset-name sharegpt
--dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json
--num-prompts 1024
--disable-log-stats
--gpu-memory-utilization=0.9
--seed 42

70b

Base
Throughput: 26.28 requests/s, 11154.41 total tokens/s, 5374.66 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

VLLM_USE_V1=1 VLLM_DISABLE_COMPILE_CACHE=1 python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \                          3m 43s  nightly
 --backend vllm \
 --model "Qwen/Qwen2.5-72B" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --tensor-parallel-size 8 \
 --seed 42

MXFP4
Throughput: 25.96 requests/s, 11018.18 total tokens/s, 5309.02 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

 VLLM_USE_V1=1 VLLM_DISABLE_COMPILE_CACHE=1 python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \ 
 --backend vllm \
 --model "data/mxfp4-Qwen2.5-72B" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --tensor-parallel-size 8 \
 --seed 42

pytorch-bot · 2025-05-12T20:38:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2201

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[PREEMPTIVE] Removal of ephemeral variants on scale-config.yml

This comment was automatically generated by Dr. CI and updates every 15 minutes.

stack-info: PR: #2201, branch: drisspg/stack/54

torchao/prototype/mx_formats/constants.py

stack-info: PR: #2201, branch: drisspg/stack/54

Summary: #2201 broke CI: 1. some MX tests for fp4 are running on A10G instances, with skipping not being properly applied (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box) 2. some SQNR thresholds were to tight for fp4 (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box) This PR fixes both of these to get CI back to green (I hope). Note that I can't repro 1 locally, so we'll have to land and see if it works. Test Plan: CI Reviewers: Subscribers: Tasks: Tags:

stack-info: PR: #2201, branch: drisspg/stack/54

Summary: #2201 broke CI: 1. some MX tests for fp4 are running on A10G instances, with skipping not being properly applied (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box) 2. some SQNR thresholds were to tight for fp4 (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box) This PR fixes both of these to get CI back to green (I hope). Note that I can't repro 1 locally, so we'll have to land and see if it works. Test Plan: CI Reviewers: Subscribers: Tasks: Tags:

drisspg added a commit that referenced this pull request May 12, 2025

Add mx_fp4 path

ff25836

stack-info: PR: #2201, branch: drisspg/stack/54

drisspg force-pushed the drisspg/stack/54 branch from 1ca9939 to ff25836 Compare May 12, 2025 20:38

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2025

drisspg added the mx label May 12, 2025

drisspg requested review from vkuzo and danielvegamyhre and removed request for vkuzo May 12, 2025 23:17

vkuzo reviewed May 13, 2025

View reviewed changes

torchao/prototype/mx_formats/constants.py Outdated Show resolved Hide resolved

vkuzo approved these changes May 13, 2025

View reviewed changes

drisspg added the topic: new feature Use this tag if this PR adds a new feature label May 13, 2025

drisspg added a commit that referenced this pull request May 13, 2025

Add mx_fp4 path

23ba26b

stack-info: PR: #2201, branch: drisspg/stack/54

drisspg force-pushed the drisspg/stack/54 branch from ff25836 to 23ba26b Compare May 13, 2025 18:55

drisspg added a commit that referenced this pull request May 13, 2025

Add mx_fp4 path

52b1682

stack-info: PR: #2201, branch: drisspg/stack/54

drisspg force-pushed the drisspg/stack/54 branch from 23ba26b to 52b1682 Compare May 13, 2025 19:00

drisspg added a commit that referenced this pull request May 13, 2025

Add mx_fp4 path

e3e2ca6

stack-info: PR: #2201, branch: drisspg/stack/54

drisspg force-pushed the drisspg/stack/54 branch from 52b1682 to e3e2ca6 Compare May 13, 2025 19:22

drisspg added a commit that referenced this pull request May 13, 2025

Add mx_fp4 path

7d3bffe

stack-info: PR: #2201, branch: drisspg/stack/54

drisspg force-pushed the drisspg/stack/54 branch from e3e2ca6 to 7d3bffe Compare May 13, 2025 19:33

Add mx_fp4 path

f1eaae9

stack-info: PR: #2201, branch: drisspg/stack/54

drisspg force-pushed the drisspg/stack/54 branch from 7d3bffe to f1eaae9 Compare May 13, 2025 20:18

drisspg merged commit 4bfd7c0 into main May 13, 2025
3 checks passed

vkuzo mentioned this pull request May 14, 2025

unbreak CI by fixing MX tests #2208

Merged

liangel-02 pushed a commit that referenced this pull request Aug 25, 2025

Add mx_fp4 path (#2201)

7669399

stack-info: PR: #2201, branch: drisspg/stack/54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mx_fp4 path #2201

Add mx_fp4 path #2201

Uh oh!

drisspg commented May 12, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add mx_fp4 path #2201

Add mx_fp4 path #2201

Uh oh!

Conversation

drisspg commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

70b

Uh oh!

pytorch-bot bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2201

❗ 1 Active SEVs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drisspg commented May 12, 2025 •

edited

Loading

pytorch-bot bot commented May 12, 2025 •

edited

Loading