Add dynamic quantization support to gemlite layout #2327

mobicham · 2025-06-06T12:50:06Z

This PR adds dynamic quantization support to the GemLite layout. The new config param is mode:

mode="weight_only" - no activation quantization (default).
mode="dynamic" - uses int8 dynamic quantization for A8W8 and fp8 for A8Wn.

import torch

model_id = "meta-llama/Llama-3.1-8B-Instruct"
from transformers import AutoModelForCausalLM, TorchAoConfig
from torchao.quantization import GemliteUIntXWeightOnlyConfig

quant_gemlite = GemliteUIntXWeightOnlyConfig(
    bit_width=8, group_size=None, mode="dynamic" #A8W8
)
quant_config = TorchAoConfig(
    quant_type=quant_gemlite,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="cuda",
    quantization_config=quant_config,
)

pytorch-bot · 2025-06-06T12:50:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2327

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit a94c9cf with merge base dd22777 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_fusion.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_True_dynamic_True
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_fusion.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_True_dynamic_True

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-06-06T14:29:08Z

I think by "static" you meant weight only?

torchao/quantization/quant_api.py

Signed-off-by: mobicham <[email protected]>

mobicham · 2025-06-11T09:23:55Z

Anything else to have this merged? Thank you!

jerryzh168 · 2025-06-13T14:46:46Z

we can merge now, the test failures does not look related

* fix get_plain() with FMA mode * update * fix in_features/out_feature meta-data mismatch * update gemlite slice test * add packing_bitwidth support * add packing_bitwidth support and cleanup * update default gemlite layout * cleanup * fix symmetric use-case and relax _same_meta_data * _copy() meta data * fix (4,) in autoquant * Add dynamic mode in gemlite layout * mode explanation Signed-off-by: mobicham <[email protected]> * use weights_only instead of static --------- Signed-off-by: mobicham <[email protected]>

mobicham and others added 18 commits June 2, 2025 12:26

fix get_plain() with FMA mode

36c0c25

update

5cc70e1

Merge branch 'pytorch:main' into main

4c9dad8

fix in_features/out_feature meta-data mismatch

9ac689e

update gemlite slice test

bece806

add packing_bitwidth support

ba7b4f1

add packing_bitwidth support and cleanup

33e2bf6

update default gemlite layout

587ab10

cleanup

1cb7794

Merge branch 'pytorch:main' into main

2a31e9d

fix symmetric use-case and relax _same_meta_data

fc7ff50

fix symmetric use-case and relax _same_meta_data

75c13a5

_copy() meta data

2d66fb4

fix (4,) in autoquant

eba10ad

Merge branch 'pytorch:main' into main

b2892ce

Merge branch 'pytorch:main' into main

320167c

Merge branch 'pytorch:main' into main

19ea1c2

Add dynamic mode in gemlite layout

9c7d41d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 6, 2025

jerryzh168 reviewed Jun 6, 2025

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Jun 6, 2025

mobicham added 2 commits June 6, 2025 14:48

mode explanation

6c7537b

Signed-off-by: mobicham <[email protected]>

use weights_only instead of static

d11f3e2

jerryzh168 approved these changes Jun 6, 2025

View reviewed changes

Merge branch 'pytorch:main' into main

a94c9cf

jerryzh168 merged commit 0afa4c1 into pytorch:main Jun 13, 2025
17 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dynamic quantization support to gemlite layout #2327

Add dynamic quantization support to gemlite layout #2327

Uh oh!

mobicham commented Jun 6, 2025 •

edited by jerryzh168

Loading

Uh oh!

pytorch-bot bot commented Jun 6, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Jun 6, 2025

Uh oh!

Uh oh!

mobicham commented Jun 11, 2025

Uh oh!

jerryzh168 commented Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

Add dynamic quantization support to gemlite layout #2327

Add dynamic quantization support to gemlite layout #2327

Uh oh!

Conversation

mobicham commented Jun 6, 2025 • edited by jerryzh168 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2327

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

jerryzh168 commented Jun 6, 2025

Uh oh!

Uh oh!

mobicham commented Jun 11, 2025

Uh oh!

jerryzh168 commented Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

mobicham commented Jun 6, 2025 •

edited by jerryzh168

Loading

pytorch-bot bot commented Jun 6, 2025 •

edited

Loading