Skip to content

Conversation

mobicham
Copy link
Contributor

@mobicham mobicham commented Jun 6, 2025

This PR adds dynamic quantization support to the GemLite layout. The new config param is mode:

  • mode="weight_only" - no activation quantization (default).
  • mode="dynamic" - uses int8 dynamic quantization for A8W8 and fp8 for A8Wn.
import torch

model_id = "meta-llama/Llama-3.1-8B-Instruct"
from transformers import AutoModelForCausalLM, TorchAoConfig
from torchao.quantization import GemliteUIntXWeightOnlyConfig

quant_gemlite = GemliteUIntXWeightOnlyConfig(
    bit_width=8, group_size=None, mode="dynamic" #A8W8
)
quant_config = TorchAoConfig(
    quant_type=quant_gemlite,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="cuda",
    quantization_config=quant_config,
)

Copy link

pytorch-bot bot commented Jun 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2327

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit a94c9cf with merge base dd22777 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 6, 2025
@jerryzh168
Copy link
Contributor

I think by "static" you meant weight only?

@jerryzh168 jerryzh168 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Jun 6, 2025
@mobicham
Copy link
Contributor Author

Anything else to have this merged? Thank you!

@jerryzh168
Copy link
Contributor

we can merge now, the test failures does not look related

@jerryzh168 jerryzh168 merged commit 0afa4c1 into pytorch:main Jun 13, 2025
17 of 19 checks passed
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
* fix get_plain() with FMA mode

* update

* fix in_features/out_feature meta-data mismatch

* update gemlite slice test

* add packing_bitwidth support

* add packing_bitwidth support and cleanup

* update default gemlite layout

* cleanup

* fix symmetric use-case and relax _same_meta_data

* _copy() meta data

* fix (4,) in autoquant

* Add dynamic mode in gemlite layout

* mode explanation

Signed-off-by: mobicham <[email protected]>

* use weights_only instead of static

---------

Signed-off-by: mobicham <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants