Skip to content

Conversation

drisspg
Copy link
Contributor

@drisspg drisspg commented May 12, 2025

Stacked PRs:


Add mx_fp4 path

BF16 70B MLP: https://fburl.com/aeqm5s4v~ 1300 us
MXFP8 70B MLP: https://fburl.com/uxgoju4r ~ 723us
MXFP4 70B MLP: https://fburl.com/u95f6f39 ~ 600us

Looks like we need to do some more kernel tuning since the gemm kernel is only about 70% the execution time instead of of 50%

Benchmarks

Baseline Model
Throughput: 56.68 requests/s, 24053.96 total tokens/s, 11590.20 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \
 --backend vllm \
 --model "Qwen/Qwen2-7B-Instruct" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --seed 42

MXFP8:
Throughput: 50.52 requests/s, 21443.10 total tokens/s, 10332.18 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \
 --backend vllm \
 --model "/home/drisspg/meta/scripts/data/mxfp8-Qwen2-7B-Instruct" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --seed 42

MXFP4:

Throughput: 56.64 requests/s, 24039.96 total tokens/s, 11583.46 output tokens/s
Total num prompt tokens:  225190
Total num output tokens:  209407

python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py
--backend vllm
--model "/home/drisspg/meta/scripts/data/mxfp4-Qwen2-7B-Instruct"
--dataset-name sharegpt
--dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json
--num-prompts 1024
--disable-log-stats
--gpu-memory-utilization=0.9
--seed 42

70b

Base
Throughput: 26.28 requests/s, 11154.41 total tokens/s, 5374.66 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

VLLM_USE_V1=1 VLLM_DISABLE_COMPILE_CACHE=1 python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \                          3m 43s  nightly
 --backend vllm \
 --model "Qwen/Qwen2.5-72B" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --tensor-parallel-size 8 \
 --seed 42

MXFP4
Throughput: 25.96 requests/s, 11018.18 total tokens/s, 5309.02 output tokens/s
Total num prompt tokens: 225190
Total num output tokens: 209407

 VLLM_USE_V1=1 VLLM_DISABLE_COMPILE_CACHE=1 python /home/drisspg/meta/vllm/benchmarks/benchmark_throughput.py \ 
 --backend vllm \
 --model "data/mxfp4-Qwen2.5-72B" \
 --dataset-name sharegpt \
 --dataset-path /home/drisspg/meta/scripts/data/ShareGPT_V3_unfiltered_cleaned_split.json \
 --num-prompts 1024 \
 --disable-log-stats \
 --gpu-memory-utilization=0.9 \
 --tensor-parallel-size 8 \
 --seed 42

Copy link

pytorch-bot bot commented May 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2201

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg added a commit that referenced this pull request May 12, 2025
stack-info: PR: #2201, branch: drisspg/stack/54
@drisspg drisspg force-pushed the drisspg/stack/54 branch from 1ca9939 to ff25836 Compare May 12, 2025 20:38
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2025
@drisspg drisspg added the mx label May 12, 2025
@drisspg drisspg requested review from vkuzo and danielvegamyhre and removed request for vkuzo May 12, 2025 23:17
@drisspg drisspg added the topic: new feature Use this tag if this PR adds a new feature label May 13, 2025
drisspg added a commit that referenced this pull request May 13, 2025
stack-info: PR: #2201, branch: drisspg/stack/54
@drisspg drisspg force-pushed the drisspg/stack/54 branch from ff25836 to 23ba26b Compare May 13, 2025 18:55
drisspg added a commit that referenced this pull request May 13, 2025
stack-info: PR: #2201, branch: drisspg/stack/54
@drisspg drisspg force-pushed the drisspg/stack/54 branch from 23ba26b to 52b1682 Compare May 13, 2025 19:00
drisspg added a commit that referenced this pull request May 13, 2025
stack-info: PR: #2201, branch: drisspg/stack/54
@drisspg drisspg force-pushed the drisspg/stack/54 branch from 52b1682 to e3e2ca6 Compare May 13, 2025 19:22
drisspg added a commit that referenced this pull request May 13, 2025
stack-info: PR: #2201, branch: drisspg/stack/54
@drisspg drisspg force-pushed the drisspg/stack/54 branch from e3e2ca6 to 7d3bffe Compare May 13, 2025 19:33
stack-info: PR: #2201, branch: drisspg/stack/54
@drisspg drisspg force-pushed the drisspg/stack/54 branch from 7d3bffe to f1eaae9 Compare May 13, 2025 20:18
@drisspg drisspg merged commit 4bfd7c0 into main May 13, 2025
3 checks passed
vkuzo added a commit that referenced this pull request May 14, 2025
Summary:

#2201 broke CI:
1. some MX tests for fp4 are running on A10G instances, with skipping
   not being properly applied
   (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box)
2. some SQNR thresholds were to tight for fp4
   (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box)

This PR fixes both of these to get CI back to green (I hope). Note that
I can't repro 1 locally, so we'll have to land and see if it works.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
drisspg pushed a commit that referenced this pull request May 14, 2025
Summary:

#2201 broke CI:
1. some MX tests for fp4 are running on A10G instances, with skipping
   not being properly applied
   (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box)
2. some SQNR thresholds were to tight for fp4
   (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box)

This PR fixes both of these to get CI back to green (I hope). Note that
I can't repro 1 locally, so we'll have to land and see if it works.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
stack-info: PR: #2201, branch: drisspg/stack/54
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
Summary:

#2201 broke CI:
1. some MX tests for fp4 are running on A10G instances, with skipping
   not being properly applied
   (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box)
2. some SQNR thresholds were to tight for fp4
   (https://hud.pytorch.org/pytorch/ao/commit/4bfd7c09ef4592eacbbf990aea6d6bda608865c1#42164784332-box)

This PR fixes both of these to get CI back to green (I hope). Note that
I can't repro 1 locally, so we'll have to land and see if it works.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. mx topic: new feature Use this tag if this PR adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants