Skip to content

Commit 2db6f07

Browse files
iseeyuanfacebook-github-bot
authored andcommitted
Use a symmetric quantization with no clipping error to improve llama perplexity (#5163)
Summary: Refer to pytorch/ao#805 for the details. With this change, the perplexity of a llama model is improved 4% on wikitext. Reviewed By: mergennachin, helunwencser Differential Revision: D62342523 Pulled By: iseeyuan
1 parent f9da675 commit 2db6f07

File tree

2 files changed

+5
-2
lines changed

2 files changed

+5
-2
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0916b5b29b092afcbf2b898caae49abe80662bac
1+
c6abf2bd576828dc8ed175fba2c4c1d0d3681a1d

examples/models/llama2/source_transformation/quantize.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,12 @@ def quantize(
7373
if group_size is None:
7474
raise Exception("For 8da4w quantization, group size must be specified.")
7575
from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer
76+
from torchao.quantization.quant_primitives import MappingType
7677

7778
model = Int8DynActInt4WeightQuantizer(
78-
precision=torch_dtype, groupsize=group_size
79+
precision=torch_dtype,
80+
groupsize=group_size,
81+
mapping_type=MappingType.SYMMETRIC_NO_CLIPPING_ERR,
7982
).quantize(model)
8083
if verbose:
8184
print("quantized model:", model)

0 commit comments

Comments
 (0)